Skip to content

test: dynamic kubo/daemon ports with bind-race retry (fixes macOS flake) (#87)#88

Merged
Rinse12 merged 2 commits into
masterfrom
fix/dynamic-kubo-ports-87
Jun 9, 2026
Merged

test: dynamic kubo/daemon ports with bind-race retry (fixes macOS flake) (#87)#88
Rinse12 merged 2 commits into
masterfrom
fix/dynamic-kubo-ports-87

Conversation

@Rinse12

@Rinse12 Rinse12 commented Jun 9, 2026

Copy link
Copy Markdown
Member

Problem

macOS CI intermittently failed (e.g. run 27199400016) with:

serveHTTPApi: manet.Listen(/ip4/0.0.0.0/tcp/50599) failed: listen tcp4 0.0.0.0:50599: bind: address already in use

Root cause: every kubo API port hardcoded in the suite (50079, 50089, 50099, 50121, 50299, 50599, …) sits inside macOS's ephemeral port range (49152–65535). With fileParallelism: true, the macOS kernel can hand one of those ports to a concurrently running test file's outbound fetch() as its source port, so kubo's bind on that port intermittently fails. Swarm ports were already fixed this way (/tcp/0); API/gateway/RPC were not.

Fix

Generalize the proven pattern already living in test/kubo/kuboRpcGateway.integration.test.ts (allocate a free port via net bind-to-:0, retry on address already in use) into shared helpers, and apply it to every test that boots a real kubo:

  • test/helpers/daemon-helpers.tsallocateFreePort, allocateKuboEndpoints, isAddressInUseError, startPkcDaemonWithDynamicPorts (fresh free ports + retry on the bind race, returning the winning endpoints), and withKuboBindRetry (same retry for direct startKuboNode calls and manual daemon spawns).
  • test/helpers/kubo-helpers.tspreInitKuboWithEphemeralSwarm is now idempotent, so a retry can reuse a seeded dataPath while picking up the newly allocated ports.
  • Converted: daemon, daemon-kubo-restart-race, logs, command-completion-time, edit-null-removal.e2e, challenge-integration, mintpass-integration, update-install-restart-race.
  • Negative blocks (port-occupied validation, external-kubo adoption) use fresh free ports but deliberately do not retry, preserving their failure/adoption semantics.
  • New test/helpers/dynamic-ports.test.ts covers the allocator + retry helper.

Verification (local, on top of origin/master)

build + build:test green, and every converted file passes:

File Result
dynamic-ports (new) 7/7
daemon-kubo-restart-race (the flake) 4/4
daemon 22/22
logs / command-completion-time / edit-null-removal / update-install-restart-race 51/51
challenge-integration 6/6
mintpass-integration 4/4

Closes #87

Summary by CodeRabbit

  • Tests

    • Updated test infrastructure to allocate daemon ports dynamically, eliminating hardcoded port conflicts during parallel test execution.
    • Added regression tests for port allocation and bind-race retry behavior.
  • Refactor

    • Introduced dynamic port allocation utilities and retry logic for resilient test daemon startup.
    • Enhanced kubo initialization to be idempotent, skipping redundant setup steps when already initialized.

Hardcoded kubo API ports across the test suite fell inside macOS's
ephemeral port range (49152-65535). Under fileParallelism the kernel
could hand one to another test file's outbound fetch() as its source
port, so kubo's bind intermittently failed with "address already in
use" (e.g. the daemon-kubo-restart-race SIGTERM test on macOS CI).

Generalize the proven pattern already in kuboRpcGateway.integration.test
into shared helpers and apply it to every test that boots a real kubo:

- daemon-helpers: allocateFreePort/allocateKuboEndpoints, isAddressInUseError,
  startPkcDaemonWithDynamicPorts (fresh free ports + retry on the bind race),
  and withKuboBindRetry (same for direct startKuboNode / manual daemon spawns).
- kubo-helpers: make preInitKuboWithEphemeralSwarm idempotent so a retry can
  reuse a seeded dataPath while picking up the newly allocated ports.
- Convert daemon, daemon-kubo-restart-race, logs, command-completion-time,
  edit-null-removal, challenge-integration, mintpass-integration and
  update-install-restart-race to dynamic ports. Negative blocks (port-occupied
  validation, external-kubo adoption) use fresh free ports but deliberately
  do NOT retry, preserving their failure/adoption semantics.
- Add test/helpers/dynamic-ports.test.ts covering the allocator + retry helper.
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@Rinse12, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 182310ab-fa81-4b55-99e0-f6a51905237b

📥 Commits

Reviewing files that changed from the base of the PR and between 8e1e407 and bb891cb.

📒 Files selected for processing (3)
  • test/cli/daemon-kubo-restart-race.test.ts
  • test/cli/daemon.test.ts
  • test/helpers/daemon-helpers.ts
📝 Walkthrough

Walkthrough

This PR eliminates intermittent "address already in use" test failures in parallel CI runs on macOS by replacing hardcoded kubo API/RPC/gateway ports with kernel-allocated free ports and bind-race retry logic. New helpers allocate endpoints, detect bind-race errors, and retry daemon startup; preInitKuboWithEphemeralSwarm becomes idempotent; all CLI test files migrate to dynamic ports.

Changes

Dynamic Port Allocation and Test Migrations

Layer / File(s) Summary
Core port allocation utilities
test/helpers/daemon-helpers.ts
allocateFreePort binds ephemeral TCP sockets to obtain kernel-selected free ports; KuboEndpoints interface and allocateKuboEndpoints allocate three free ports and derive RPC websocket, kubo RPC/REST, and gateway URLs; isAddressInUseError detects EADDRINUSE bind races; DynamicDaemonResult type combines endpoints with the daemon process.
Daemon startup with bind-race retry logic
test/helpers/daemon-helpers.ts
startPkcDaemonWithDynamicPorts repeatedly allocates fresh endpoints, injects KUBO_RPC_URL and IPFS_GATEWAY_URL into the daemon environment, and on address-in-use failures calls ensureKuboNodeStopped before retrying; withKuboBindRetry is a generic wrapper applying the same allocation/retry semantics to any caller-provided operation.
Make kubo initialization idempotent
test/helpers/kubo-helpers.ts
preInitKuboWithEphemeralSwarm now checks if the kubo config file exists before running kubo init and kubo config profile apply server, allowing safe reuse across retry attempts while still updating API/Gateway/Swarm addresses.
Regression tests for dynamic port allocation
test/helpers/dynamic-ports.test.ts
New Vitest suite validates allocateFreePort returns bindable ports, allocateKuboEndpoints yields distinct ports with correctly formatted URLs, isAddressInUseError recognizes bind-race errors from strings and Error instances, and withKuboBindRetry correctly retries on bind races, exhausts retry budgets, and invokes cleanup callbacks.
Challenge integration test migration
test/cli/challenge-integration.test.ts
Replace hardcoded RPC_PORT, KUBO_API_PORT, rpcWsUrl constants with mutable variables assigned via startPkcDaemonWithDynamicPorts in beforeAll.
Command completion time test migration
test/cli/command-completion-time.test.ts
Remove hardcoded port constants; use startPkcDaemonWithDynamicPorts to populate mutable RPC/kubo/gateway port variables in beforeAll.
Daemon kubo restart-race test migration
test/cli/daemon-kubo-restart-race.test.ts
Migrate four regression tests ("concurrent keepKuboUp entries", "wedged kubo startup", "late signal-exit registrant", "startKuboNode settles on failure") from fixed URLs to startPkcDaemonWithDynamicPorts and withKuboBindRetry, capturing dynamic kuboApiUrl for health checks and cleanup.
Main daemon test suite migration
test/cli/daemon.test.ts
Comprehensive refactor: root daemon tests use startPkcDaemonWithDynamicPorts for dynamic RPC and kubo endpoints; "port availability validation" tests allocate fresh ports per test via allocateKuboEndpoints; kubo lifecycle tests capture dynamic kuboApiUrl for cleanup; environment-variable tests use dynamic startup; DEBUG/no-DEBUG tests allocate endpoints dynamically.
Edit null removal e2e test migration
test/cli/edit-null-removal.e2e.test.ts
Replace hardcoded port/URL constants with mutable variables populated by startPkcDaemonWithDynamicPorts in beforeAll.
Logs test migration
test/cli/logs.test.ts
Replace hardcoded port constants with mutable rpcWsUrl variable assigned via startPkcDaemonWithDynamicPorts for live daemon testing.
Mintpass integration test migration
test/cli/mintpass-integration.test.ts
Remove fixed port/URL constants; use startPkcDaemonWithDynamicPorts to populate mutable RPC/kubo port variables in beforeAll.
Update install restart-race test migration
test/cli/update-install-restart-race.test.ts
Remove hardcoded port constants; add kuboApiUrl variable for dynamic capture; refactor from shared static environment to isolated dynamic environment passing startPkcDaemonWithDynamicPorts-returned URLs into both daemon and subprocess.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #87: Root cause of intermittent "address already in use" failures on macOS due to hardcoded kubo API ports falling within the ephemeral range; this PR implements the proven fix already demonstrated in kuboRpcGateway.integration.test.ts across the entire daemon test suite.

Possibly related PRs

  • bitsocialnet/bitsocial-cli#71: Both PRs modify test/cli/daemon-kubo-restart-race.test.ts—PR #71 adds the kubo restart-race regression tests while this PR refactors those same tests to use startPkcDaemonWithDynamicPorts and dynamic kubo endpoints.
  • bitsocialnet/bitsocial-cli#47: Both PRs update test/cli/daemon.test.ts port validation logic; PR #47 adds fail-fast "PKC RPC port already in use" behavior while this PR refactors the same test section to use dynamically allocated and occupied endpoints.
  • bitsocialnet/bitsocial-cli#74: Both PRs involve test/cli/update-install-restart-race.test.ts; PR #74 changes production shutdown logic to wait for daemon PID exit while this PR refactors that test to use dynamic ports and environment isolation.

Poem

🐰 Port Allocation Rhapsody

Ephemeral ports caused us pain,
Hardcoded numbers down macOS rain,
Now we allocate with grace,
Retry races, find free space,
Tests run parallel without strain! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: dynamic kubo/daemon ports with bind-race retry (fixes macOS flake) (#87)' clearly and concisely describes the main change: introducing dynamic port allocation and retry logic to fix macOS test flakiness.
Linked Issues check ✅ Passed The pull request fully addresses all objectives from issue #87: adds shared helpers (allocateFreePort, allocateKuboEndpoints, startPkcDaemonWithDynamicPorts, withKuboBindRetry), makes preInitKuboWithEphemeralSwarm idempotent, converts all relevant tests to dynamic ports with retry, and adds regression tests.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing dynamic port allocation and bind-race retry behavior to fix macOS test flakiness. No unrelated modifications or scope creep detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dynamic-kubo-ports-87

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/cli/daemon.test.ts (1)

712-715: 💤 Low value

Consider accepting the endpoints parameter in the buildEnv callback for consistency.

The callback at line 714 uses () => ({ PKC_RPC_AUTH_KEY: customAuthKey }) while other usages in this PR (e.g., update-install-restart-race.test.ts lines 149-154) use (e) => ({ ... }) even when e is not directly referenced in that particular expression. For consistency and clarity, consider:

 const daemon = await startPkcDaemonWithDynamicPorts(
     (e) => ["--pkcOptions.dataPath", randomDirectory(), "--pkcRpcUrl", e.rpcWsUrl],
-    () => ({ PKC_RPC_AUTH_KEY: customAuthKey })
+    (_e) => ({ PKC_RPC_AUTH_KEY: customAuthKey })
 );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cli/daemon.test.ts` around lines 712 - 715, The buildEnv callback passed
to startPkcDaemonWithDynamicPorts should accept the endpoints parameter for
consistency with other tests: change the anonymous zero-arg function used in the
call to accept a parameter (e) and return the same object so the signature
matches other usages (e.g., use (e) => ({ PKC_RPC_AUTH_KEY: customAuthKey })).
Update the callback in the startPkcDaemonWithDynamicPorts invocation so it takes
the endpoints arg rather than using a no-arg arrow function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/helpers/daemon-helpers.ts`:
- Around line 260-266: The retry path currently calls
ensureKuboNodeStopped(endpoints.kuboApiUrl) which will stop whichever process is
listening on that port (possibly another test's daemon); instead limit cleanup
to the child launched by this attempt: track the child process/handle created
for this attempt (e.g., the variable used to spawn the kubo daemon) and on an
EADDRINUSE error only kill/cleanup that child (or skip cleanup if that child
never bound or if you've already moved to a new endpoint set); replace the
global ensureKuboNodeStopped call with a targeted stop/kill of the
attempt-specific process or a no-op when the process is not the one you spawned,
using the spawn handle or PID saved for this attempt and preserving the existing
isAddressInUseError, lastError, attempt and retries logic.
- Around line 192-229: allocateFreePort currently defaults to probing 127.0.0.1
while allocateKuboEndpoints assumes kubo/gateway will bind to 0.0.0.0, which can
produce port collisions; update allocateKuboEndpoints to call allocateFreePort
with the actual bind hosts: use allocateFreePort("127.0.0.1") for the RPC port
(rpcPort/rpcWsUrl) and allocateFreePort("0.0.0.0") for kuboPort and gatewayPort
so the ports are reserved on the same interfaces kubo/gateway will use (refer to
the functions allocateFreePort and allocateKuboEndpoints).

---

Nitpick comments:
In `@test/cli/daemon.test.ts`:
- Around line 712-715: The buildEnv callback passed to
startPkcDaemonWithDynamicPorts should accept the endpoints parameter for
consistency with other tests: change the anonymous zero-arg function used in the
call to accept a parameter (e) and return the same object so the signature
matches other usages (e.g., use (e) => ({ PKC_RPC_AUTH_KEY: customAuthKey })).
Update the callback in the startPkcDaemonWithDynamicPorts invocation so it takes
the endpoints arg rather than using a no-arg arrow function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ce26e3b4-5457-401f-a159-31a85176c0d1

📥 Commits

Reviewing files that changed from the base of the PR and between 725916d and 8e1e407.

📒 Files selected for processing (11)
  • test/cli/challenge-integration.test.ts
  • test/cli/command-completion-time.test.ts
  • test/cli/daemon-kubo-restart-race.test.ts
  • test/cli/daemon.test.ts
  • test/cli/edit-null-removal.e2e.test.ts
  • test/cli/logs.test.ts
  • test/cli/mintpass-integration.test.ts
  • test/cli/update-install-restart-race.test.ts
  • test/helpers/daemon-helpers.ts
  • test/helpers/dynamic-ports.test.ts
  • test/helpers/kubo-helpers.ts

Comment thread test/helpers/daemon-helpers.ts
Comment thread test/helpers/daemon-helpers.ts
- daemon.test.ts: startPkcDaemonCapturingStderr now pre-inits an ephemeral
  swarm (like startPkcDaemon) and captures the kubo RPC URL. The two DEBUG
  tests previously let the daemon bind swarm 4001; on Windows the daemon's
  kill doesn't take kubo with it, so the first test's kubo lingered and the
  second collided on 4001. Ephemeral swarm + kuboRpcUrl-driven /shutdown fix it.
- daemon-helpers.ts (CodeRabbit): allocateKuboEndpoints probes kubo/gateway
  ports on 0.0.0.0 (the interface kubo actually binds) instead of 127.0.0.1,
  so a port free only on loopback can't be handed out for a wildcard bind.
- daemon-helpers.ts (CodeRabbit): startPkcDaemonWithDynamicPorts no longer
  ensureKuboNodeStopped()s the losing port on a bind race — in a same-suite
  race that listener is another test's healthy daemon. Our failed attempt's
  subprocess has already exited and never bound, so there's nothing to clean.
- daemon-kubo-restart-race.test.ts: the wedged-daemon retry now SIGKILLs its
  own process group on failure instead of shutting down the contended port.
@Rinse12

Rinse12 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Thanks @coderabbitai — both points were valid and are addressed in bb891cb:

  1. Probe the bind hostallocateKuboEndpoints now probes the kubo/gateway ports on 0.0.0.0 (where kubo binds) and only the RPC port on 127.0.0.1, so a port free only on loopback can't be handed out for a wildcard bind.
  2. Don't shut down the collided-port winnerstartPkcDaemonWithDynamicPorts no longer calls ensureKuboNodeStopped on a lost bind race. Our failed attempt's subprocess has already exited and never bound, so there was nothing of ours to clean — and in a same-suite race that listener is another test's healthy daemon. The wedged-daemon retry in daemon-kubo-restart-race likewise now SIGKILLs its own process group instead of shutting down the contended port.

Same commit also fixes the Windows CI failure (unrelated to your comments): the two DEBUG env var tests used startPkcDaemonCapturingStderr, which didn't pre-init an ephemeral swarm, so the daemon bound swarm 4001. On Windows the daemon's kill doesn't take kubo with it, so the first test's kubo lingered and the second collided on 4001. That helper now pre-inits an ephemeral swarm and captures the kubo RPC URL for /shutdown.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@Rinse12 All three fixes are well-reasoned — thanks for the detailed walkthrough:

  1. Probe-host correctness — probing kubo/gateway ports on 0.0.0.0 instead of loopback is exactly right; a port can be free on 127.0.0.1 while occupied on the wildcard interface, so the previous probe could hand out an unbindable port to kubo.
  2. Don't touch the winner's binding — since the failed attempt's process exited before it ever bound, ensureKuboNodeStopped was operating on nothing of ours and risked tearing down a healthy peer test's daemon. Clean call.
  3. Windows kubo lingering — the platform difference (kill doesn't cascade to kubo on Windows) is a good catch; pre-initing an ephemeral swarm in startPkcDaemonCapturingStderr and capturing the kubo RPC URL for a clean /shutdown is the right fix to avoid the 4001 collision on the second DEBUG test.

🐇 ✨


✏️ Learnings added
Learnt from: Rinse12
URL: https://github.com/bitsocialnet/bitsocial-cli/pull/88

Timestamp: 2026-06-09T12:05:34.359Z
Learning: In test/helpers/daemon-helpers.ts (bitsocialnet/bitsocial-cli), `allocateKuboEndpoints` probes kubo and gateway ports on `0.0.0.0` (the wildcard interface where kubo binds) and the RPC port on `127.0.0.1` (loopback only). This distinction matters because a port can be free on loopback while occupied on the wildcard interface, so probing only loopback could hand out an unbindable port for kubo's wildcard bind.

Learnt from: Rinse12
URL: https://github.com/bitsocialnet/bitsocial-cli/pull/88

Timestamp: 2026-06-09T12:05:34.359Z
Learning: In test/helpers/daemon-helpers.ts (bitsocialnet/bitsocial-cli), `startPkcDaemonWithDynamicPorts` does NOT call `ensureKuboNodeStopped` after a bind-race failure. The failed attempt's subprocess already exited without ever binding, so there is nothing to clean up, and in a parallel-test scenario the port's current owner is another test's healthy daemon that must not be shut down.

Learnt from: Rinse12
URL: https://github.com/bitsocialnet/bitsocial-cli/pull/88

Timestamp: 2026-06-09T12:05:34.359Z
Learning: In bitsocialnet/bitsocial-cli tests, on Windows, killing the pkc daemon process does not cascade to the kubo child process (unlike macOS/Linux). The `startPkcDaemonCapturingStderr` helper therefore pre-inits an ephemeral swarm and captures the kubo RPC URL so it can send a `/shutdown` request to kubo explicitly, preventing port 4001 collisions between consecutive DEBUG env-var tests.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@Rinse12 Rinse12 merged commit 67efae8 into master Jun 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test flake: hardcoded kubo API ports in macOS ephemeral range cause intermittent 'address already in use'

1 participant