You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
10 self-contained PRs targeting 6 release-blocking issue clusters. Combined effect: every Windows install since v13.0.1 should boot cleanly, the worker bundle drops ~720 KB and 68 OAuth provider URLs (with zod inlined for install-fragility), the marketplace install no longer ships dead deps, a year-long PID-reuse deadlock gets a guard, and the observer death-loop (issue #2468) is closed.
⏳ = fixup commits applied, Greptile re-review has been pending 17+ hours — may need a maintainer manual re-trigger via the Re-trigger Greptile link on each PR.
Recommended merge order
Independent (any order — no file overlap):
┌── #2592 transcript-watcher build target
├── #2593 Windows cmd.exe metacharacter escape
├── #2594 mcp-search node-e launcher
├── #2595 sync-marketplace cross-platform
├── #2598 hooks printenv PATH probe
├── #2599 Windows PID-token
├── #2602 observer prompt truncation + soft-overflow
└── #2610 inline zod into worker bundle (touches scripts/build-hooks.js
externals — coordinates with #2596 if both land same release;
#2596 removes better-auth from externals, #2610 removes zod;
different keys in the same array, no merge conflict)
Pair (ship in the same release cycle):
#2596 better-auth external
│
└─► #2597 build install plugin/node_modules
(without #2597, #2596 breaks the dev workflow because the
externalized better-auth has nowhere to resolve from at runtime)
All 10 PRs are mergeable independently from a code-conflict standpoint (different files, or different functions in the same file). The #2596 / #2597 dependency is a runtime concern, not a merge-conflict one. #2596 and #2610 both edit scripts/build-hooks.js externals but in disjoint keys.
What each cluster fixes
Windows shell / hooks (5 issues, 2 PRs)
fix(chroma): escape cmd.exe metacharacters in uvx args on Windows #2593 silences MCP error -32000: Connection closed on every Chroma sync on Windows. Root cause: cmd.exe parses the unquoted < / > in protobuf<7 / onnxruntime>=1.20 as I/O redirection metacharacters before launching uvx. Escapes ^<>&|% on the Windows branch only; % was added in the fixup after Greptile flagged that user-supplied chromaApiKey could contain % and get silently env-expanded.
fix(mcp): replace sh launcher with cross-platform node -e (closes #2461) #2594 replaces sh -c '...exec node ...' with node -e '...same logic in JS...'. Windows has no sh so the old launcher was a hard block. Fixup adds POSIX SIGTERM/SIGINT/SIGHUP forwarding + correct signal-death exit code via removeAllListeners + process.kill(self, sig) — verified with a sleep-child smoke test (launcher exits 143 when SIGTERM'd).
Packaging (4 issues, 5 PRs)
fix(build): add transcript-watcher.cjs build target (closes #2450) #2592claude-mem transcript watch was silently broken since v13.x because runtime.ts:230 hard-references plugin/scripts/transcript-watcher.cjs but scripts/build-hooks.js never compiled it. Adds the build target + a thin entry that dispatches argv into the existing runTranscriptCommand. Fixup added zod to external + a 200 KB bundle-size guard.
chore(scripts): drop sync-to-marketplace.sh and harden sync-marketplace.cjs #2595 removes the legacy sync-to-marketplace.sh (Windows incompatibility) and adds --dry-run + opt-in --force-delete + a PRESERVE_PATTERNS allowlist that protects user env / secrets from rsync --delete. Fixup addressed all 5 Greptile findings including a real bug where --dry-run alone (no --force-delete) was skipping rsync entirely.
refactor(build): externalize better-auth from worker/server-beta bundles (closes #2584) #2596 bundles ~720 KB and 68 OAuth provider hostnames out of worker-service.cjs by externalizing better-auth*. Root cause: BetterAuthRoutes.ts uses await Promise.all([import('better-auth/node'), import('./auth.js')]) hoping esbuild would tree-shake, but bundle:true + no splitting inlines literal-path dynamic imports as deferred CJS factories. Now resolved at Node runtime via plugin/node_modules. Fixup made the version pin derive from root package.json (throws on missing — no silent fallback).
fix(build): populate plugin/node_modules during build (partial #2407) #2597 populates plugin/node_modules during build so the externalized better-auth / zod / tree-sitter grammars can resolve at runtime. Without --ignore-scripts (deliberately) so prebuild-install can download prebuilt .node bindings instead of triggering node-gyp on niche archs. Fixup added shell: process.platform === 'win32' for the Windows .cmd batch wrapper, anchored cwd to __dirname, and switched to node:child_process.
fix(supervisor): make Windows captureProcessStartToken actually return a token (closes #2578) #2599 Until this PR, captureProcessStartToken() returned null on Windows, which made verifyPidFileOwnership short-circuit at currentToken === null → return true, completely disabling PID-reuse detection. When the worker died and Windows reassigned its PID ([Windows] Stale worker.pid causes deadlock when OS reuses PID for unrelated process #2578 reports SignalRgbLauncher.exe), claude-mem status reported the worker as healthy and start refused to launch a fresh one. Now uses PowerShell Get-Process to compose a (StartTime.Ticks | ProcessName) token. Every failure mode falls back to the prior null-returning behavior, so worst case is no regression. Fixup added the idiomatic result.error || result.status !== 0 spawnSync guard on both Windows and POSIX branches.
Observer (1 issue, 1 PR)
fix(observer): truncate oversized prompt fields + drop overflow from hard-stop (closes #2468) #2602 closes the observer death-loop in Observer context has no budget management: unbounded tool_output causes infinite overflow loop with data loss #2468. A 130k-char Read tool result blew the observer model's context window → SDK aborts with 'overflow' → GeneratorExitHandler.isHardStopReason returns true → clearPendingForSession wipes every queued observation including the unrelated ones queued after the offending Read → new observations re-trigger the generator → cycle repeats until RestartGuard trips. The fix has two layers: (1) buildObservationPrompt() now caps each <parameters> / <outcome> field at 16k chars with a head + tail slice and an explicit <elided chars="..." /> marker the model is told to respect; (2) 'overflow' is removed from isHardStopReason(), so if a residual cause (e.g. an oversized conversation history) still trips "prompt is too long" the queued observations are preserved by the restart path instead of cleared. RestartGuard caps consecutive overflow→restart cycles so we can't loop forever.
Depends on observer prompt scaffolding from #2602 (the <elided/> marker + grounding hint). Will be a follow-up PR adding a post-process grounding validator (fact references must exist in source observation).
Solved partially in #2597 (dev workflow + marketplace path). Adding plugin/node_modules to package.json#files for the npm tarball is a separate review concern (tarball size, cross-arch publish).
+329 KB / +332 KB on hot bundles, but removes the most-reported install regression of v13.x. zod is pure JS, used pervasively, so inlining matches mcp-server.cjs's already-shipping pattern.
Existing observations that previously overflowed will now succeed with a head+tail-only view; that's strictly better than the prior "wipe entire batch" failure mode. No external API changes, no schema changes.
Every fixup commit addresses a concrete Greptile finding documented in the PR thread — no speculative changes. Re-trigger link is available on each PR; pending 17+ hours suggests bot may need manual nudge from a maintainer.
All 10 PRs run npm run build cleanly on macOS arm64 (Node 22, npm 10). Windows behavior is deferred — but every Windows-specific change is gated on process.platform === 'win32' and falls back to today's behavior on failure, so the worst case is no regression.
Per-PR verification (also documented in each PR description):
Tagging @thedotmack — happy to split the merge into multiple smaller releases if that's easier, or to fold in additional follow-ups if any of these uncover further concerns during review.
TL;DR
10 self-contained PRs targeting 6 release-blocking issue clusters. Combined effect: every Windows install since v13.0.1 should boot cleanly, the worker bundle drops ~720 KB and 68 OAuth provider URLs (with zod inlined for install-fragility), the marketplace install no longer ships dead deps, a year-long PID-reuse deadlock gets a guard, and the observer death-loop (issue #2468) is closed.
This issue exists so the PRs can be reviewed and merged together as a coordinated set without losing the cross-PR context.
PR-by-PR
fix(build)add transcript-watcher.cjs build targetfix(chroma)escape cmd.exe metacharacters in uvx args on Windowsfix(mcp)replace sh launcher with cross-platform node -echore(scripts)drop sync-to-marketplace.sh + harden sync-marketplace.cjsrefactor(build)externalize better-auth from worker/server-beta bundlesfix(build)populate plugin/node_modules during buildfix(hooks)replace fragile $SHELL -lc PATH probe with printenv-first patternfix(supervisor)make Windows captureProcessStartToken actually return a tokenfix(observer)truncate oversized prompt fields + drop overflow from hard-stopfix(build)inline zod into worker / server-beta / context-generator bundles⏳ = fixup commits applied, Greptile re-review has been pending 17+ hours — may need a maintainer manual re-trigger via the
Re-trigger Greptilelink on each PR.Recommended merge order
All 10 PRs are mergeable independently from a code-conflict standpoint (different files, or different functions in the same file). The #2596 / #2597 dependency is a runtime concern, not a merge-conflict one. #2596 and #2610 both edit
scripts/build-hooks.jsexternals but in disjoint keys.What each cluster fixes
Windows shell / hooks (5 issues, 2 PRs)
MCP error -32000: Connection closedon every Chroma sync on Windows. Root cause:cmd.exeparses the unquoted</>inprotobuf<7/onnxruntime>=1.20as I/O redirection metacharacters before launching uvx. Escapes^<>&|%on the Windows branch only;%was added in the fixup after Greptile flagged that user-suppliedchromaApiKeycould contain%and get silently env-expanded.printf: write error: Permission deniednoise on every hook invocation. Root cause:$($SHELL -lc 'echo $PATH' 2>/dev/null)fails in Git Bash when stdin isn't a TTY. Replaced with the sameprintenv PATH+$SHELL -lc 'printf %s "$PATH"'fallback thatcodex-hooks.jsonalready uses in production. Scope clarification: @kls06541's review on fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598 surfaced that Windows (Git Bash): printf write error Permission denied blocks UserPromptSubmit hook #2439's report actually attributes the symptom to a separatenested-pipeconstruct in the same hooks. fix(hooks): replace fragile $SHELL -lc PATH probe with printenv-first pattern #2598 fixes the PATH probe half only; the nested-pipe remediation is tracked in Windows hooks: nested-pipe plugin-root lookup may still trigger 'printf: write error: Permission denied' after #2598 #2613 pending Win11 + Node 24.x reproduction. Thecloses #2439line is optimistic until that verification.Cross-platform MCP launcher (1 issue, 1 PR)
sh -c '...exec node ...'withnode -e '...same logic in JS...'. Windows has noshso the old launcher was a hard block. Fixup adds POSIX SIGTERM/SIGINT/SIGHUP forwarding + correct signal-death exit code viaremoveAllListeners + process.kill(self, sig)— verified with a sleep-child smoke test (launcher exits 143 when SIGTERM'd).Packaging (4 issues, 5 PRs)
claude-mem transcript watchwas silently broken since v13.x becauseruntime.ts:230hard-referencesplugin/scripts/transcript-watcher.cjsbutscripts/build-hooks.jsnever compiled it. Adds the build target + a thin entry that dispatches argv into the existingrunTranscriptCommand. Fixup addedzodto external + a 200 KB bundle-size guard.sync-to-marketplace.sh(Windows incompatibility) and adds--dry-run+ opt-in--force-delete+ aPRESERVE_PATTERNSallowlist that protects user env / secrets fromrsync --delete. Fixup addressed all 5 Greptile findings including a real bug where--dry-runalone (no--force-delete) was skipping rsync entirely.worker-service.cjsby externalizingbetter-auth*. Root cause:BetterAuthRoutes.tsusesawait Promise.all([import('better-auth/node'), import('./auth.js')])hoping esbuild would tree-shake, butbundle:true+ nosplittinginlines literal-path dynamic imports as deferred CJS factories. Now resolved at Node runtime viaplugin/node_modules. Fixup made the version pin derive from rootpackage.json(throws on missing — no silent fallback).plugin/node_modulesduring build so the externalizedbetter-auth/zod/ tree-sitter grammars can resolve at runtime. Without--ignore-scripts(deliberately) soprebuild-installcan download prebuilt.nodebindings instead of triggering node-gyp on niche archs. Fixup addedshell: process.platform === 'win32'for the Windows.cmdbatch wrapper, anchoredcwdto__dirname, and switched tonode:child_process.Cannot find module 'zod/v3'regression ([Bug] 13.x worker fails with "Cannot find module 'zod/v3'" — package-lock.json missing zod entry #2437) that has shipped on v13.0/v13.1/v13.2/v13.3. Inlineszodinto the three Node-platform bundles (worker, server-beta, context-generator) the same waymcp-server.cjsalready does it. Bundle cost +329 KB worker, +332 KB server-beta. Complementary to refactor(build): externalize better-auth from worker/server-beta bundles (closes #2584) #2596/fix(build): populate plugin/node_modules during build (partial #2407) #2597: refactor(build): externalize better-auth from worker/server-beta bundles (closes #2584) #2596 externalizes large rarely-usedbetter-authto shrink the bundle; fix(build): inline zod into worker / server-beta / context-generator bundles (closes #2437) #2610 inlines small hot-pathzodto remove the install-fragility surface entirely. zod has no native bindings and no startup dynamic-loading expectations, so inlining is safe. The existing assertion atbuild-hooks.js:269-274(which already guardsmcp-server.cjsagainst zod regression) provides the test harness for the same pattern across the new bundles.Supervisor (1 issue, 1 PR)
captureProcessStartToken()returnednullon Windows, which madeverifyPidFileOwnershipshort-circuit atcurrentToken === null → return true, completely disabling PID-reuse detection. When the worker died and Windows reassigned its PID ([Windows] Stale worker.pid causes deadlock when OS reuses PID for unrelated process #2578 reportsSignalRgbLauncher.exe),claude-mem statusreported the worker as healthy andstartrefused to launch a fresh one. Now uses PowerShellGet-Processto compose a(StartTime.Ticks | ProcessName)token. Every failure mode falls back to the prior null-returning behavior, so worst case is no regression. Fixup added the idiomaticresult.error || result.status !== 0spawnSync guard on both Windows and POSIX branches.Observer (1 issue, 1 PR)
'overflow'→GeneratorExitHandler.isHardStopReasonreturns true →clearPendingForSessionwipes every queued observation including the unrelated ones queued after the offending Read → new observations re-trigger the generator → cycle repeats until RestartGuard trips. The fix has two layers: (1)buildObservationPrompt()now caps each<parameters>/<outcome>field at 16k chars with a head + tail slice and an explicit<elided chars="..." />marker the model is told to respect; (2)'overflow'is removed fromisHardStopReason(), so if a residual cause (e.g. an oversized conversation history) still trips "prompt is too long" the queued observations are preserved by the restart path instead of cleared. RestartGuard caps consecutive overflow→restart cycles so we can't loop forever.What's intentionally NOT in this batch
<elided/>marker + grounding hint). Will be a follow-up PR adding a post-process grounding validator (fact references must exist in source observation).plugin/node_modulestopackage.json#filesfor the npm tarball is a separate review concern (tarball size, cross-arch publish).tr ' ' ':'corruption on Windows paths with spaceshooks.jsonandcodex-hooks.jsonso split out into its own issue.Risk summary
mcp-server.cjs's already-shipping pattern.--force-deleteis now opt-in; documented as a migration note.Empirical checks
All 10 PRs run
npm run buildcleanly on macOS arm64 (Node 22, npm 10). Windows behavior is deferred — but every Windows-specific change is gated onprocess.platform === 'win32'and falls back to today's behavior on failure, so the worst case is no regression.Per-PR verification (also documented in each PR description):
p.replace(/%/g,"%%").replace(/([\^<>&|])/g,"^$1")after fixupgrep -c better-auth12 → 2 (onlyrequire()calls)npm installinplugin/adds 57 packages in 58s on macOS/usr/localps -p $pid -o lstart=returns timestamp (POSIX path unaffected)<elided chars="186k" original_size_chars="200k" />marker; small inputs byte-identical toJSON.stringifygrep -E "require\(['\"]zod(/[^'\"]*)?['\"]\)"returns 0 hits;grep -c 'ZodError\|ZodObject\|ZodString'returns 7 hits (inline confirmed)Tagging @thedotmack — happy to split the merge into multiple smaller releases if that's easier, or to fold in additional follow-ups if any of these uncover further concerns during review.