perf: set TCP_NODELAY, cheap allocs, socket tuning, postprocess deferral on engine_* path by KA-ROM · Pull Request #27 · flashbots/rproxy

KA-ROM · 2026-06-05T13:10:30Z

Summary

Hot-path tuning for the engine_* proxy path: removes per-request allocations and redundant lookups, tunes backend TCP socket options, and defers/skips postprocess when not needed. One file changed (proxy.rs) plus a small helper. No flag or config surface added.

Removed from the hot path

No heap alloc per header check: is_hop_by_hop_header matches HeaderName constants directly; per-call lowercased String is gone.
~5 fewer hashmap lookups per request: in-flight Gauge/Counter + proxy_failure_count handles resolved once per worker, cached on ProxyHttp.
No metric lookup on UA path: per-worker scc::HashMap<String, Counter> keyed by &str, soft-capped at 100 entries to bound memory under hostile UAs.
connection_info() bound once/request instead of 5x.
No per-header revalidation on append: headers_mut().append(...) skips reparsing names through awc's builder.
No re-parse on response forwarding: dropped HeaderName::from_str (backend names already valid).
No full HeaderMap clone: capture only content-encoding (the one field read downstream).
JSON parse reuse: parsed JrpcRequestMetaMaybeBatch stashed on ProxiedHttpRequest, reused by finalise_proxying.

Behavior changes (observable on wire / in ops)

TCP_NODELAY on backend conns: disables Nagle; cuts up to ~40ms tail on small engine_* pairs (Nagle + Linux TCP_DELACK_MIN=40ms).
TCP_KEEPALIVE on backend conns (60s): catches silently-dead backends so the idle pool stops handing out broken sockets.
awc disconnect_timeout tightened to 100ms: faster idle-conn reclamation. Safe for loopback (op-rbuilder); code comment flags it if backend moves off-host. Per awc 3.8.1, this only affects the idle pool, not in-flight requests.
Defer + skip postprocess: bookkeeping insert reordered after stream_to_client (intra-handler reorder, not a spawn; relies on actix not polling response bodies until the handler returns). Response parse is skipped entirely when neither logging nor mirroring is enabled; metrics path preserved.

Unchanged

Which requests get intercepted, mirrored, or logged: only when postprocess runs.
No flag, no config surface.

History

Squashed three explore-then-revert commit pairs (17=>13) for review clarity.

akundaz · 2026-06-09T13:45:47Z

+    // Per-worker cache of (user_agent -> Counter) so that we only pay the
+    // `Family::get_or_create` cost the first time a UA is seen on this
+    // worker. Lookup is `&str`-keyed (no allocation on hit).
+    client_info_cache: HashMap<String, Counter>,


Can't the user agent just be anything the client sets? Do you need a metric that uses that as a label?

generally, yes, it can.

practically, it's what rollup-boost sends (which is its version that doesn't change often)

akundaz · 2026-06-09T13:50:09Z

                    .conn_keep_alive(2 * timeout)
+                    // 100ms grace is fine for loopback (op-rbuilder); raise if backend
+                    // is ever non-loopback (would risk truncating responses on shutdown).
+                    .disconnect_timeout(Duration::from_millis(100))


Should this be configurable?

0x416e746f6e · 2026-06-09T13:40:50Z

+    // Per-worker cached metric handles. These are cheap to clone (each
+    // wraps an `Arc<AtomicU64>` internally) and bypass the per-request
+    // `Family::get_or_create` lookup on the hot path.
+    in_flight_client: Gauge,
+    in_flight_backend: Gauge,
+    proxy_failure_count: Counter,


suggestion:

perhaps better to create ProxyHttpMetrics type, intantiate it from shared.metrics, and make it a member of ProxyHttp => this way adding new metrics will require less boiler-plate.

0x416e746f6e · 2026-06-09T13:43:00Z

    matches!(
-        name.as_str().to_ascii_lowercase().as_str(),
-        "connection" | "host" | "keep-alive" | "transfer-encoding"
-    )
+        name,
+        &header::CONNECTION |
+            &header::HOST |
+            &header::TRANSFER_ENCODING
+    ) || name.as_str().eq_ignore_ascii_case("keep-alive")


question:

HTTP headers are case-insensitive. (e.g. CONNECTION, connection, and CoNnEcTiOn are all valid and all refer to the same header).

does this change respect that?

0x416e746f6e · 2026-06-09T13:47:36Z

+            // Hot path: read-only lookup keyed by &str (no allocation).
+            // Cold path: allocate the owned String key and resolve the
+            // counter from the metrics family exactly once per worker per
+            // distinct UA.
+            if let Some(entry) = this.client_info_cache.read_sync(user_agent, |_, c| c.clone()) {
+                entry.inc();
+            } else {
+                let counter = metrics
+                    .client_info
+                    .get_or_create(&LabelsProxyClientInfo {
+                        proxy: P::name(),
+                        user_agent: user_agent.to_string(),
+                    })
+                    .clone();
+                counter.inc();
+                // soft cap to prevent unbounded growth from hostile UA values
+                if this.client_info_cache.len() < 100 {
+                    // Best-effort insert; races with another task on the same
+                    // worker thread shouldn't happen given !Send actix workers,
+                    // but tolerate insert errors regardless.
+                    let _ = this.client_info_cache.insert_sync(user_agent.to_string(), counter);
+                }
+            }


suggestion:

this would be nice to encapsulate in ProxyHttpMetrics too

0x416e746f6e · 2026-06-09T13:49:15Z

+        let tcp_nodelay = actix_tls::connect::Connector::new(
+            actix_tls::connect::Resolver::default(),
+        )
+        .service()
+        .map(|conn: actix_tls::connect::Connection<awc::http::Uri, tokio::net::TcpStream>| {
+            let _ = conn.io_ref().set_nodelay(true);
+            let _ = socket2::SockRef::from(conn.io_ref()).set_tcp_keepalive(
+                &socket2::TcpKeepalive::new().with_time(Duration::from_secs(60)),
+            );
+            conn
+        });


question:

w.d.y.t. about putting this behind CLI-switch? (so that the user could choose on case-by-case).

(TCP_NODELAY could still be the default).

0x416e746f6e · 2026-06-09T13:50:21Z


+        // Build the inner TCP connector ourselves so we can flip
+        // TCP_NODELAY on after each connect
+        use actix_service::ServiceExt as _;


suggestion:

imports should go to the top

0x416e746f6e · 2026-06-09T13:52:32Z

-            .connector(Connector::new().conn_keep_alive(2 * timeout).limit(connections_limit))
+            .connector(
+                Connector::new()
+                    .connector(tcp_nodelay)


question:

what about client side? originally I thought about setting TCP_NODELAY up there

0x416e746f6e · 2026-06-09T13:59:15Z

+        // Initiate the response stream first so the client sees no extra
+        // bookkeeping latency on the critical path, then file the request
+        // into the in-flight map for the eventual response postprocessor.
+        // The insert must complete before the response body stream finishes
+        // (which is when `postprocess_backend_response` calls `remove_sync`);
+        // this is guaranteed because actix won't begin polling the streaming
+        // body until after this synchronous handler returns.
+        let this_clone = this.clone();
+        let res = Self::stream_to_client(this, req_id, conn_id, bknd_res);
+        this_clone.postprocess_client_request(req);
+        res


issue:

client request is post-processed (i.e. inserted into in-flight requests hashmap) first (i.e. before streaming the request to the backend) for a reason.

otherwise, there is a potential for race-condition when the response arrives while the request info is still not inserted.

0x416e746f6e · 2026-06-09T14:03:05Z

+        // Fast path: when nobody is logging the proxied request/response and
+        // there are no mirror peers configured, skip the (expensive) response
+        // decompress + parse and the (always-evaluated) `info!` formatting in
+        // `maybe_log_proxied_request_and_response`. We still parse the request
+        // for the jrpc method label so metrics keep their correct dimensions.
+        let log_off =
+            !inner.config().log_proxied_requests() && !inner.config().log_proxied_responses();
+        let no_mirror = mirroring_peers.is_empty();
+        if log_off && no_mirror {
+            // If the request body has already been parsed (intercept path)
+            // we can avoid a second decompress + parse.
+            if let Some(jrpc) = clnt_req.jrpc_meta.clone() {
+                Self::emit_metrics_on_proxy_success(&jrpc, &clnt_req, &bknd_res, metrics);
+                return;
+            }
+            if clnt_req.decompressed_size < clnt_req.size {
+                (clnt_req.decompressed_body, clnt_req.decompressed_size) = decompress(
+                    clnt_req.body.clone(),
+                    clnt_req.size,
+                    clnt_req.info.content_encoding(),
+                );
+            }
+            match serde_json::from_slice::<JrpcRequestMetaMaybeBatch>(&clnt_req.decompressed_body) {
+                Ok(jrpc) => {
+                    Self::emit_metrics_on_proxy_success(&jrpc, &clnt_req, &bknd_res, metrics);
+                }
+                Err(err) => {
+                    warn!(
+                        proxy = P::name(),
+                        request_id = %clnt_req.info.req_id,
+                        connection_id = %clnt_req.info.conn_id,
+                        worker_id = %worker_id,
+                        error = ?err,
+                        "Failed to parse json-rpc request",
+                    );
+                }
+            }
+            return;
+        }
+
        if clnt_req.decompressed_size < clnt_req.size {


question/issue:

finalise_proxying happens off the hot-path (request received/forwarded, response received/forwarded).

why do we need to optimise things here? (it makes things more complicated, but what do we win with this?)

0x416e746f6e · 2026-06-09T14:15:07Z

+                // Stash the parsed jrpc so `finalise_proxying` can avoid
+                // re-parsing the same body.
+                Some(Arc::new(jrpc))


question:

how often is this code-path hit in practice?

0x416e746f6e · 2026-06-09T14:18:48Z

        let mut clnt_res = Self::to_client_response(&bknd_res);

        let preallocate = this.shared.config().prealloacated_response_buffer_size();
+        let content_encoding = bknd_res.headers().get(header::CONTENT_ENCODING).cloned();


question:

are we sure content-encoding is the only header we want to send back to the client?

KA-ROM added 2 commits June 5, 2026 15:58

Drop String alloc in is_hop_by_hop_header

4c77c34

Cache in-flight metric handles on ProxyHttp

0da8076

KA-ROM marked this pull request as draft June 5, 2026 13:10

KA-ROM changed the title ~~Perf/cheap allocs~~ Small optimizations Jun 5, 2026

This was referenced Jun 5, 2026

Refcount body chunks + drop request/response correlation HashMap #28

Closed

fix: set TCP_NODELAY on backend connections #26

Closed

KA-ROM added 11 commits June 9, 2026 15:37

Cache user-agent counter per worker, cap at 100

bfc08dc

Bind connection_info() once per request

69e1189

Append headers without per-iter revalidation

4f39ca9

Drop HeaderName::from_str on response forwarding

c9a0725

Set TCP_NODELAY on backend connections

9d16040

Tighten backend awc disconnect_timeout to 100ms

5274b87

Enable TCP_KEEPALIVE on backend connections

f267b6d

Drop HeaderMap clone, capture content-encoding only

9c77780

Defer postprocess_client_request off the critical path

d929376

Skip postprocess entirely for non-logged, non-mirrored requests

818595f

Reuse parsed JrpcRequestMetaMaybeBatch in finalise_proxying

f86fd2d

KA-ROM force-pushed the perf/cheap-allocs branch from d7d7f94 to f86fd2d Compare June 9, 2026 12:40

KA-ROM changed the title ~~Small optimizations~~ perf: set TCP_NODELAY, cheap allocs, socket tuning, postprocess deferral on engine_* path Jun 9, 2026

KA-ROM requested a review from 0x416e746f6e June 9, 2026 12:54

KA-ROM marked this pull request as ready for review June 9, 2026 12:54

KA-ROM requested review from akundaz, avalonche and julio4 June 9, 2026 12:56

akundaz reviewed Jun 9, 2026

View reviewed changes

0x416e746f6e reviewed Jun 9, 2026

View reviewed changes

Conversation

KA-ROM commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Removed from the hot path

Behavior changes (observable on wire / in ops)

Unchanged

History

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KA-ROM commented Jun 5, 2026 •

edited

Loading