Skip to content

fix: disable HTTP/2 connection pooling to prevent "channel closed" errors#4274

Open
zhenguo1492 wants to merge 1 commit into
metalbear-co:mainfrom
zhenguo1492:fix/http2-connection-pooling-channel-closed
Open

fix: disable HTTP/2 connection pooling to prevent "channel closed" errors#4274
zhenguo1492 wants to merge 1 commit into
metalbear-co:mainfrom
zhenguo1492:fix/http2-connection-pooling-channel-closed

Conversation

@zhenguo1492
Copy link
Copy Markdown

Summary

  • Disable HTTP/2 connection pooling in ClientStore to fix "channel closed" errors when stealing HTTP/2 / gRPC traffic

Problem

When the local application closes idle HTTP/2 connections (e.g. Quarkus gRPC with Vert.x sends GOAWAY after ~3s idle), the pooled http2::SendRequest sender becomes a dead reference. Subsequent requests that reuse this sender fail with:

WARN: failed to send the request to the local application's HTTP server: channel closed

The retry logic (can_retry()) correctly identifies SendFailed as retryable, but get_with_pooling() returns the same dead connection from the pool, causing all retries to fail.

Root Cause Timeline

  1. First request arrives → make_client() creates new HTTP/2 connection → request succeeds
  2. Connection goes idle for ~3s → local server sends GOAWAY → spawned connection task finishes normally
  3. Next request arrives → wait_for_ready() returns cached dead sender from pool
  4. sender.send_request()"channel closed" (sender's internal channel already closed)
  5. Retry → pool returns same dead sender → same error

This was already documented in the code as a known issue on Windows (line 90-92 in client_store.rs), but it equally affects Linux with any HTTP/2 server that has short idle timeouts.

Fix

Set should_enable_connection_pooling() to return false. Each request now creates a fresh HTTP/2 connection.

Better Alternative (suggestion)

A more targeted fix would be to check sender.is_closed() when retrieving from the pool and discard stale entries, or to have the spawned connection task notify the pool when the connection closes. Disabling pooling entirely is the minimal safe change.

Test Environment

  • mirrord 3.210.0 (OSS) steal mode
  • Istio 1.24.3 service mesh (mTLS, PERMISSIVE)
  • Quarkus 3.34.6 gRPC server (Vert.x transport, Virtual Threads)
  • Kubernetes 1.30.14
  • Envoy gRPC-JSON transcoder → gRPC backend

Test Plan

  • Verified grpcurl -plaintext localhost:8080 works (local gRPC server is functional)
  • Confirmed "channel closed" with connection pooling enabled (multiple attempts)
  • Confirmed zero "channel closed" errors with connection pooling disabled
  • Tested with Istio sidecar enabled (full mTLS mesh) — works correctly
  • Tested Quarkus dev mode hot reload through mirrord — works correctly

…rors

When the local application (e.g. Quarkus gRPC with Vert.x) closes idle
HTTP/2 connections, the pooled sender becomes a dead reference. Subsequent
requests that reuse this sender fail with "channel closed". The retry logic
exists but also pulls from the pool, getting the same dead connection.

This was already documented in the code comments as a known issue on Windows,
but it also affects Linux with any HTTP/2 server that has short idle timeouts.

Root cause timeline:
1. First request → new HTTP/2 connection → success
2. Connection idle ~3s → server sends GOAWAY → connection closes normally
3. Next request → pool returns cached dead sender → "channel closed"
4. Retry → pool returns same dead sender → "channel closed" again

The proper fix would be to check sender.is_closed() when retrieving from
the pool, but disabling pooling is the minimal safe change.

Tested with: Istio 1.24 + Quarkus 3.34 gRPC + mirrord steal mode.
@aviramha aviramha requested a review from Razz4780 May 19, 2026 08:44
@Razz4780
Copy link
Copy Markdown
Contributor

Razz4780 commented May 19, 2026

Thanks for contributing to mirrord! 🤘

TBH I don't think the described root cause timeline applies. The sender is only returned to the pool explicitly here, after a successful HTTP exchange. So the next attempt should use a new sender

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants