Skip to content

Per-collab-session Claude credential isolation (one opencode subprocess per session) #15

@sqeeswy

Description

@sqeeswy

What to build

Run one opencode subprocess per collab session so each session has its own copy of the opencode-claude-auth plugin's module-level state — accounts, active source, cached credentials, sync timer, refresh write-back path. This gives true per-session credential isolation: Alice's session uses Alice's Claude subscription, Bob's session uses Bob's, regardless of who uploaded last.

Why this is the architectural answer (and why we didn't do it now)

Investigation (PR-prep on 2026-05-27) showed that:

  • opencode core's Provider state IS per-workspace (packages/opencode/src/provider/provider.ts:1174, wrapped in InstanceState).
  • opencode core's Auth service IS process-wide (single ~/.local/share/opencode/auth.json).
  • opencode-claude-auth plugin (griffinmartin/opencode-claude-auth@1.5.4) uses module-level closures for its accounts list, active account source, cached credentials, AND a setInterval that writes auth.json every 5 minutes.

Net effect today: ALL collab sessions on the same container share a single anthropic provider auth. Whoever uploaded credentials last via /collab/claude-creds wins. Every team member's LLM usage bills to that account.

The previously-considered fix paths and why they all reduce to forking or subprocessing:

Path Why rejected for now
Write per-session creds to auth.json then init the workspace then clear Race-prone (concurrent inits), plugin's 5-minute sync timer overwrites the file, plugin's 401-retry refresh write-back goes to module-level cache. All state is module-level in the plugin — InstanceState wrapping doesn't isolate it.
Fork opencode-claude-auth to accept a workspace context option Maintaining a divergent fork; complicates upstream sync. Doable but ongoing tax.
Modify opencode core's Auth to accept a workspace context Touches upstream core; either upstreaming or vendoring the change.
One opencode subprocess per collab session (this issue) Each subprocess has its own module-level plugin state. No fork. No core changes. Real isolation. Cost: IPC complexity, RAM/CPU multiplier, port management.

Proposed architecture

┌─────────────────────────────────────────────────────────────────────────┐
│ collab-router (parent process — what we run today as PID 1)             │
│   - handleCollabRequest                                                  │
│   - SSE fan-out to iframes                                               │
│   - manages a pool of session workers                                    │
└──────────────────┬──────────────────────────────────────────────────────┘
                   │ spawns + manages subprocess lifecycle
                   ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ session worker (one per collab session)                                  │
│   - bun src/index.ts serve --port <dynamic> --collab-session-id <cs_X>   │
│   - HOME=/var/opencode/sessions/cs_X (own ~/.claude, own ~/.local/...)   │
│   - own opencode HTTP server on a dynamic loopback port                  │
│   - own plugin instance with its own module-level state                  │
│   - own credentials file populated from collab session's encrypted blob  │
└─────────────────────────────────────────────────────────────────────────┘

The parent process owns: collab DB, OAuth flow, invite routes, SSE fan-out, iframe URL building, workspace pre-cloning.

Each session worker owns: one workspace, one Anthropic auth identity, one provider state.

Communication: parent-to-worker via HTTP on the dynamic loopback port (replaces today's nativeFetch to localhost:4096 — replace with localhost:<worker-port>).

Acceptance criteria

  • Each active collab session has its own opencode subprocess. ps aux inside the container shows N+1 bun processes for N active sessions (plus the parent collab-router).
  • Each session worker runs with HOME pointing at a session-scoped directory (/var/opencode/sessions/<cs_id>/). Its ~/.claude/.credentials.json, ~/.local/share/opencode/auth.json, and ~/.local/share/opencode/<db files> are isolated.
  • The opencode-claude-auth plugin's module-level state is per-worker (verified by uploading different creds for two sessions and confirming each session uses its own; check the Authorization: Bearer <accessToken> in outgoing requests via the plugin's intercept logging).
  • No code change in opencode-claude-auth — we use it as-is.
  • /collab/claude-creds upload routes to the specific session's worker, not process-wide. Banner copy updates to reflect per-session scope.
  • Worker lifecycle: spawned on session creation; killed on DELETE /collab/session/:id + idle-TTL eviction (e.g., 30 min no activity); auto-respawn on crash with exponential backoff.
  • Parent process port allocation strategy that doesn't collide and survives parent restart.
  • /healthz reflects worker pool health (e.g., desired vs running counts).
  • Memory budget: with N=10 sessions, container fits within current 4 GiB Fargate task allocation, OR the task definition is bumped with rationale.
  • DEPLOYMENT.md updated: new ADR (e.g. 0010) explaining the per-session subprocess model.

Open design questions to resolve in implementation

  • Port allocation: ephemeral OS-assigned vs deterministic mapping from session id. Recommend ephemeral with a manifest in EFS for parent restart recovery.
  • Worker boot time: a fresh bun serve cold-start takes seconds. Acceptable if hidden behind the existing PreparingWorkspacePanel; otherwise need a warm pool.
  • Worker eviction policy: keep N hot, evict LRU. What's the target N?
  • Resource limits: per-worker memory cap? CPU pinning?
  • Logging: each worker writes to its own log stream OR multiplexes to parent stdout with a session-id prefix?
  • EFS contention: N workers writing to EFS concurrently. Each session has its own subdir but the underlying file system handles all. Need to verify EFS throughput at expected concurrency.

Blocked by

None — can start whenever the team wants to pick it up.

Notes

  • Today's reality (one process for all sessions) is documented in DEPLOYMENT.md 'How Claude credentials are supplied' section + the pre-upload warning banner shipped alongside this issue.
  • The dispose-and-reload pattern (disposeAllInstancesAndEmitGlobalDisposed) used today is a workaround that gives correct behaviour at the cost of cross-session credential leakage. Per-session subprocess isolation is the proper fix.
  • Reference reading: packages/opencode/src/provider/provider.ts:1170-1255, packages/opencode/src/auth/index.ts, and the plugin source at https://github.com/griffinmartin/opencode-claude-auth/blob/main/src/index.ts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions