Why Gzip Isn't Enough
"If my API responses are gzipped, do I still need normalization?"
Yes. Gzip compresses repeated byte sequences; normalization removes repeated structure. They solve different problems, and they stack.
The benchmark
Measured on a real transport search API response (Bicester to Sainte-Marie-aux-Mines route search) containing routes, segments, hops, shared carriers, and shared places.
Minified JSON
| Format | Raw | Gzipped | Ratio vs Normalized |
|---|---|---|---|
| Normalized | 345 KB | 58 KB | 1.0x |
| Unnormalized | 713 KB | 121 KB | 2.1x |
Raw savings: 368 KB (52%). Gzipped savings: 63 KB (52%).
Pretty-printed JSON
| Format | Raw | Gzipped | Ratio vs Normalized |
|---|---|---|---|
| Normalized | 538 KB | 65 KB | 1.0x |
| Unnormalized | 1,482 KB | 150 KB | 2.3x |
Gzip closes the gap but does not eliminate it. Even after compression, the normalized payload is 2.1x smaller.
Why gzip doesn't fully cancel normalization
Three structural reasons prevent gzip from recovering all the redundancy that normalization removes:
1. Repeated objects are not byte-identical in context
Gzip works on a sliding window of bytes. Two copies of the same carrier object appear in different positions surrounded by different JSON keys, array indices, and sibling fields. The compressor sees similar byte sequences, not identical ones, so it cannot always collapse them perfectly.
2. Inlining duplicates entire subtrees, not just leaf strings
When the same carrier appears in 30 segments, the unnormalized form repeats not just the carrier name but its entire subtree: transit images, line references, places, path coordinates. Each level of nesting multiplies the redundancy in ways that exceed gzip's window size for large payloads.
3. Integer indexes are extremely cheap
In a normalized payload, "carrier": 2 is a few bytes. The full carrier object it replaces can be hundreds of bytes. Even the best compressor cannot shrink a 200-byte object down to the 1-2 bytes an integer index occupies.
What normalization gives beyond compression
Smaller wire size is only part of the story. Normalization also provides:
- Explicit shared references. Each entity has one canonical copy. There is no ambiguity about which instance is authoritative.
- Simpler client state management. No duplicate objects to reconcile. Update a carrier once and every segment that references it sees the change.
- Easier caching, diffing, and patching. Flat entity tables are trivial to merge, diff, or patch incrementally.
- Reduced parse and allocation work. Fewer bytes means fewer objects to deserialize and less memory pressure on the client.
These benefits apply regardless of transport compression.
The bottom line
Normalize first, gzip second. They stack.
Normalization removes structural duplication. Gzip removes byte-level redundancy in what remains. Together they produce the smallest, most efficient payload.
Getting started
Ready to normalize your API responses?
- Getting Started -- install DataNormalizer and normalize your first object graph.
- Configuration Guide -- control which types are normalized and how.