test: dynamic kubo/daemon ports with bind-race retry (fixes macOS flake) (#87)#88
Conversation
Hardcoded kubo API ports across the test suite fell inside macOS's ephemeral port range (49152-65535). Under fileParallelism the kernel could hand one to another test file's outbound fetch() as its source port, so kubo's bind intermittently failed with "address already in use" (e.g. the daemon-kubo-restart-race SIGTERM test on macOS CI). Generalize the proven pattern already in kuboRpcGateway.integration.test into shared helpers and apply it to every test that boots a real kubo: - daemon-helpers: allocateFreePort/allocateKuboEndpoints, isAddressInUseError, startPkcDaemonWithDynamicPorts (fresh free ports + retry on the bind race), and withKuboBindRetry (same for direct startKuboNode / manual daemon spawns). - kubo-helpers: make preInitKuboWithEphemeralSwarm idempotent so a retry can reuse a seeded dataPath while picking up the newly allocated ports. - Convert daemon, daemon-kubo-restart-race, logs, command-completion-time, edit-null-removal, challenge-integration, mintpass-integration and update-install-restart-race to dynamic ports. Negative blocks (port-occupied validation, external-kubo adoption) use fresh free ports but deliberately do NOT retry, preserving their failure/adoption semantics. - Add test/helpers/dynamic-ports.test.ts covering the allocator + retry helper.
|
Warning Review limit reached
More reviews will be available in 32 minutes and 34 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR eliminates intermittent "address already in use" test failures in parallel CI runs on macOS by replacing hardcoded kubo API/RPC/gateway ports with kernel-allocated free ports and bind-race retry logic. New helpers allocate endpoints, detect bind-race errors, and retry daemon startup; ChangesDynamic Port Allocation and Test Migrations
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/cli/daemon.test.ts (1)
712-715: 💤 Low valueConsider accepting the
endpointsparameter in thebuildEnvcallback for consistency.The callback at line 714 uses
() => ({ PKC_RPC_AUTH_KEY: customAuthKey })while other usages in this PR (e.g., update-install-restart-race.test.ts lines 149-154) use(e) => ({ ... })even wheneis not directly referenced in that particular expression. For consistency and clarity, consider:const daemon = await startPkcDaemonWithDynamicPorts( (e) => ["--pkcOptions.dataPath", randomDirectory(), "--pkcRpcUrl", e.rpcWsUrl], - () => ({ PKC_RPC_AUTH_KEY: customAuthKey }) + (_e) => ({ PKC_RPC_AUTH_KEY: customAuthKey }) );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/cli/daemon.test.ts` around lines 712 - 715, The buildEnv callback passed to startPkcDaemonWithDynamicPorts should accept the endpoints parameter for consistency with other tests: change the anonymous zero-arg function used in the call to accept a parameter (e) and return the same object so the signature matches other usages (e.g., use (e) => ({ PKC_RPC_AUTH_KEY: customAuthKey })). Update the callback in the startPkcDaemonWithDynamicPorts invocation so it takes the endpoints arg rather than using a no-arg arrow function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/helpers/daemon-helpers.ts`:
- Around line 260-266: The retry path currently calls
ensureKuboNodeStopped(endpoints.kuboApiUrl) which will stop whichever process is
listening on that port (possibly another test's daemon); instead limit cleanup
to the child launched by this attempt: track the child process/handle created
for this attempt (e.g., the variable used to spawn the kubo daemon) and on an
EADDRINUSE error only kill/cleanup that child (or skip cleanup if that child
never bound or if you've already moved to a new endpoint set); replace the
global ensureKuboNodeStopped call with a targeted stop/kill of the
attempt-specific process or a no-op when the process is not the one you spawned,
using the spawn handle or PID saved for this attempt and preserving the existing
isAddressInUseError, lastError, attempt and retries logic.
- Around line 192-229: allocateFreePort currently defaults to probing 127.0.0.1
while allocateKuboEndpoints assumes kubo/gateway will bind to 0.0.0.0, which can
produce port collisions; update allocateKuboEndpoints to call allocateFreePort
with the actual bind hosts: use allocateFreePort("127.0.0.1") for the RPC port
(rpcPort/rpcWsUrl) and allocateFreePort("0.0.0.0") for kuboPort and gatewayPort
so the ports are reserved on the same interfaces kubo/gateway will use (refer to
the functions allocateFreePort and allocateKuboEndpoints).
---
Nitpick comments:
In `@test/cli/daemon.test.ts`:
- Around line 712-715: The buildEnv callback passed to
startPkcDaemonWithDynamicPorts should accept the endpoints parameter for
consistency with other tests: change the anonymous zero-arg function used in the
call to accept a parameter (e) and return the same object so the signature
matches other usages (e.g., use (e) => ({ PKC_RPC_AUTH_KEY: customAuthKey })).
Update the callback in the startPkcDaemonWithDynamicPorts invocation so it takes
the endpoints arg rather than using a no-arg arrow function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ce26e3b4-5457-401f-a159-31a85176c0d1
📒 Files selected for processing (11)
test/cli/challenge-integration.test.tstest/cli/command-completion-time.test.tstest/cli/daemon-kubo-restart-race.test.tstest/cli/daemon.test.tstest/cli/edit-null-removal.e2e.test.tstest/cli/logs.test.tstest/cli/mintpass-integration.test.tstest/cli/update-install-restart-race.test.tstest/helpers/daemon-helpers.tstest/helpers/dynamic-ports.test.tstest/helpers/kubo-helpers.ts
- daemon.test.ts: startPkcDaemonCapturingStderr now pre-inits an ephemeral swarm (like startPkcDaemon) and captures the kubo RPC URL. The two DEBUG tests previously let the daemon bind swarm 4001; on Windows the daemon's kill doesn't take kubo with it, so the first test's kubo lingered and the second collided on 4001. Ephemeral swarm + kuboRpcUrl-driven /shutdown fix it. - daemon-helpers.ts (CodeRabbit): allocateKuboEndpoints probes kubo/gateway ports on 0.0.0.0 (the interface kubo actually binds) instead of 127.0.0.1, so a port free only on loopback can't be handed out for a wildcard bind. - daemon-helpers.ts (CodeRabbit): startPkcDaemonWithDynamicPorts no longer ensureKuboNodeStopped()s the losing port on a bind race — in a same-suite race that listener is another test's healthy daemon. Our failed attempt's subprocess has already exited and never bound, so there's nothing to clean. - daemon-kubo-restart-race.test.ts: the wedged-daemon retry now SIGKILLs its own process group on failure instead of shutting down the contended port.
|
Thanks @coderabbitai — both points were valid and are addressed in bb891cb:
Same commit also fixes the Windows CI failure (unrelated to your comments): the two |
|
Tip For best results, initiate chat on the files or code changes.
🐇 ✨ ✏️ Learnings added
|
Problem
macOS CI intermittently failed (e.g. run 27199400016) with:
Root cause: every kubo API port hardcoded in the suite (50079, 50089, 50099, 50121, 50299, 50599, …) sits inside macOS's ephemeral port range (49152–65535). With
fileParallelism: true, the macOS kernel can hand one of those ports to a concurrently running test file's outboundfetch()as its source port, so kubo's bind on that port intermittently fails. Swarm ports were already fixed this way (/tcp/0); API/gateway/RPC were not.Fix
Generalize the proven pattern already living in
test/kubo/kuboRpcGateway.integration.test.ts(allocate a free port vianetbind-to-:0, retry onaddress already in use) into shared helpers, and apply it to every test that boots a real kubo:test/helpers/daemon-helpers.ts—allocateFreePort,allocateKuboEndpoints,isAddressInUseError,startPkcDaemonWithDynamicPorts(fresh free ports + retry on the bind race, returning the winning endpoints), andwithKuboBindRetry(same retry for directstartKuboNodecalls and manual daemon spawns).test/helpers/kubo-helpers.ts—preInitKuboWithEphemeralSwarmis now idempotent, so a retry can reuse a seededdataPathwhile picking up the newly allocated ports.daemon,daemon-kubo-restart-race,logs,command-completion-time,edit-null-removal.e2e,challenge-integration,mintpass-integration,update-install-restart-race.test/helpers/dynamic-ports.test.tscovers the allocator + retry helper.Verification (local, on top of
origin/master)build+build:testgreen, and every converted file passes:Closes #87
Summary by CodeRabbit
Tests
Refactor