rpc: add gzip response compression#3595
Conversation
PR SummaryLow Risk Overview Compression is skipped for WebSocket upgrades, missing gzip acceptance, existing The handler is wired into the Tendermint RPC server (after CORS), inspect RPC, and the Cosmos SDK API server (with or without unsafe CORS). Unit tests in Reviewed by Cursor Bugbot for commit 8c8e7ac. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3595 +/- ##
==========================================
- Coverage 59.02% 58.09% -0.93%
==========================================
Files 2211 2135 -76
Lines 182176 173276 -8900
==========================================
- Hits 107522 100672 -6850
+ Misses 64994 63679 -1315
+ Partials 9660 8925 -735
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
bdchatham
left a comment
There was a problem hiding this comment.
A few findings from an independent cross-review (systems / networking / security lenses) of the gzip middleware. The negotiation design is solid and client-side is backward-compatible — these three are origin-side concerns worth addressing before enabling on the public RPC/REST listeners. Line comments below.
|
|
||
| var gzPool = sync.Pool{ | ||
| New: func() any { | ||
| w := gzip.NewWriter(io.Discard) |
There was a problem hiding this comment.
CPU-DoS amplification on a public unauthenticated endpoint (level-6 compression, no size floor).
gzip.NewWriter defaults to DefaultCompression (zlib level 6). This middleware wraps the public Tendermint RPC (26657) and Cosmos REST (1317) and compresses multi-MB responses (block_results, paginated tx_search) inline on the serving goroutine. A ~200-byte unauthenticated request forces the node to gzip-level-6 megabytes of output — an asymmetric cheap-request/expensive-origin amplifier. The sync.Pool bounds allocation churn, not concurrency: a burst of N concurrent large-response requests pins N live level-6 writers (CPU + window state) at once. Since these origins CNAME directly to the ALB today (no CDN/WAF in the path per the design doc), there is no edge layer to absorb this.
Suggest:
- Use
gzip.NewWriterLevel(io.Discard, gzip.BestSpeed)(level 1). It captures most of the byte savings — which is the stated egress-cost goal — at a fraction of the CPU. - Add a minimum-size threshold (~1 KB) so tiny bodies skip compression; below that, gzip wastes CPU and can actually inflate the response.
| // and for WebSocket upgrade requests, preserving the http.Hijacker interface. | ||
| func NewGzipHandler(next http.Handler) http.Handler { | ||
| return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| if !acceptsGzip(r) || strings.EqualFold(r.Header.Get("Upgrade"), "websocket") { |
There was a problem hiding this comment.
The wrapper drops http.Hijacker; the WebSocket skip guard under-detects conformant upgrades.
gzipResponseWriter implements Header/Write/WriteHeader/Flush but not Hijack(), so wrapping the handler that serves CometBFT's /websocket route removes the http.Hijacker that gorilla's Upgrader.Upgrade requires (it asserts w.(http.Hijacker) and returns HTTP 500 otherwise). The only thing preventing that here is this skip — but strings.EqualFold(r.Header.Get("Upgrade"), "websocket") is an exact full-string match, while gorilla accepts a token list (Upgrade: websocket, foo, case-insensitive). A conformant-but-non-canonical or proxy-folded upgrade that also carries a default Accept-Encoding: gzip slips past this guard, gets wrapped, and 500s.
The in-tree CometBFT Go client is safe today (it dials single-token Upgrade: websocket with no Accept-Encoding), and TestNewGzipHandler_WebSocketPassthrough only covers that canonical case — so the gap is currently masked.
Suggest the idiomatic middleware fix: implement Hijack() on gzipResponseWriter, forwarding to the inner writer when it satisfies http.Hijacker (and don't compress a hijacked conn). That removes the dependency on header-sniffing entirely; keep this Upgrade skip as defense-in-depth. A regression test with Upgrade: websocket, foo + Accept-Encoding: gzip would lock in the fix.
| w.gz.Reset(w.resp) | ||
| hdr.Del("content-length") | ||
| hdr.Set("content-encoding", "gzip") | ||
| hdr.Add("vary", "Accept-Encoding") |
There was a problem hiding this comment.
Vary: Accept-Encoding is only emitted on the compressed path — fix before any CDN cutover.
Vary is added inside the compress branch, so responses that pass through uncompressed (no Accept-Encoding, q=0, pre-existing Content-Encoding, 204/304) emit no Vary at all.
This is not exploitable in the current direct-to-ALB topology (no shared cache to poison). But the design doc's later phase puts CloudFront in front, at which point a cache entry primed by an uncompressed request can be served to a client that sent Accept-Encoding: gzip (and vice versa) — and an attacker can choose which variant primes the entry. It's a latent bug that would ship baked into deployed nodes before the CDN lands.
Suggest:
- Emit
Vary: Accept-Encodingon every response the middleware touches (compressed or not) — e.g. set it inNewGzipHandlerbefore callingnext, or unconditionally in the wrapper. Addhere can also produce a duplicateVaryif an inner handler or CORS already set one. Read the existing value, ensureAccept-Encodingappears exactly once, andSetthe result.
bdchatham
left a comment
There was a problem hiding this comment.
Follow-up: a few stream-lifecycle (layer-2) correctness nits. A corrupt/truncated stream is worse than a missed compression, so these are worth tightening.
|
|
||
| n, err := w.gz.Write(b) | ||
| w.written += uint64(n) //nolint:gosec | ||
| if w.hasLength && w.written >= w.contentLength { |
There was a problem hiding this comment.
Early-close corruption: after this closes+nils gz, any further Write falls to the w.gz == nil branch and appends raw bytes after the gzip footer. Simplest fix: drop this block and close once in the deferred close() (it never fires in the chunked/no-Content-Length common case anyway).
| n, err := w.gz.Write(b) | ||
| w.written += uint64(n) //nolint:gosec | ||
| if w.hasLength && w.written >= w.contentLength { | ||
| if cerr := w.gz.Close(); cerr != nil && err == nil { |
There was a problem hiding this comment.
&& err == nil swallows the Close() (footer) error whenever gz.Write above errored, silently shipping a truncated stream — the evmrpc sibling does err = w.gz.Close() to surface it.
| if cerr := w.gz.Close(); cerr != nil && err == nil { | ||
| err = cerr | ||
| } | ||
| gzPool.Put(w.gz) |
There was a problem hiding this comment.
Double-Put risk: if a handler writes after ServeHTTP returns (streaming), this Put races the deferred close() Put and two requests can share one *gzip.Writer — removing the early-close block (above) eliminates it.
| } | ||
|
|
||
| wrapper := &gzipResponseWriter{resp: w} | ||
| defer wrapper.close() |
There was a problem hiding this comment.
Streaming coalescing: with no Content-Length, Close() only runs here at the end, so an incremental/streaming route's output buffers until completion — confirm no non-WS streaming RPC/REST route exists, or exempt it.
There was a problem hiding this comment.
Confirmed, no non-WS streaming routes exist. Every handler in this server (http_json_handler.go, http_uri_handler.go) marshals the full response into a byte slice and calls w.Write() once — no route writes incrementally or calls Flush() mid-response.
Also worth noting: gzipResponseWriter.Flush() already propagates gz.Flush() → the underlying http.Flusher, so any future streaming route that calls Flush() periodically would get correct incremental delivery without needing any changes here.
|
On testing — I'd lean on a single-process round-trip harness here rather than adding more The core invariant is just bytes in == bytes out, for every shape of payload. One helper serves a body through The one subtlety worth calling out: decode with Two more the recorder can't express: a concurrent variant (a couple hundred goroutines, each asserting its own payload comes back) run under The swallowed Happy to write the harness up if it's useful. And dm me if you have questions on any of this. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 5 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 007a8ff. Configure here.
amir-deris
left a comment
There was a problem hiding this comment.
Addressed feedbacks!

Summary
NewGzipHandlermiddleware (sei-tendermint/rpc/jsonrpc/server/gzip.go) that wraps anyhttp.Handlerwith transparent gzip response compression using async.Pool-backed writer for allocation efficiencysei-tendermint/internal/rpc/core/env.go) and the Cosmos SDK API server (sei-cosmos/server/api/server.go)Accept-Encoding: gzipand for WebSocket upgrade requests (preservinghttp.Hijackercompatibility)Test plan
gzip_test.gocover compressed and uncompressed paths, pool reuse, and WebSocket passthroughmake buildAccept-Encoding: gzipis sent (e.g.curl -H "Accept-Encoding: gzip" http://localhost:26657/status | file -)/websocket) are unaffectedlink to design doc: Link
🤖 Generated with Claude Code