Conversation
…file tracking Adds the ToolUse lifecycle event type and end-to-end wiring so that Codex apply_patch invocations merge their file lists into session.FilesTouched during a turn. This keeps mid-turn commit carry-forward accurate instead of falling back to transcript extraction. - Add ToolUse EventType with NewFiles/DeletedFiles/ToolName fields - Add Codex parsePostToolUse (filters for apply_patch only) - Add parseApplyPatchFiles for *** Add/Update/Delete File: markers - Add handleLifecycleToolUse dispatcher calling RecordFilesTouched - Add MutateSessionState with reentrant flock-based locking - Add RecordFilesTouched merging modified/added/deleted into state - Add internal/flock package (platform-specific file locking) - Add CodexHookRunner and SimulateCodexPostToolUseApplyPatch test helpers - Fix integration test using wrong session dir (entire-sessions -> trace-sessions)
…ncurrent sessions
Port the atomic CAS ref-update pattern (`git update-ref <new> <old>`) and
per-shadow-branch flock serialization from upstream entireio/cli (commit
6d616b1 "trail-watch-review-events-endpoint" and preceding concurrent-safety
work).
New files:
- cmd/trace/cli/checkpoint/shadow_ref.go
- `casUpdateShadowBranchRef` + `ErrShadowRefBusy`
- 16-retry loop with exponential jitter (shadowRefBackoff)
- `withShadowBranchFlock` + lock files under <git-common-dir>/trace-shadow-locks/
- `repoDirs` helper (fixed for go-git v6 / billy.Filesystem API in this tree)
Wiring:
- `GitStore.WriteTemporary` (regular session checkpoints) now runs the full
tree-build + commit + CAS update inside the per-shadow flock.
- `GitStore.WriteTemporaryTask` (sub-agent task checkpoints) receives the
identical protection.
Cleanup:
- Deleted the now-redundant `strategy/state_lock_{unix,windows}.go` (exact
duplicates of `internal/flock.Acquire`; the real cross-process per-session
locking already lives in `acquireSessionGate` + `internal/flock`).
Tests:
- Added `TestSaveStep_ConcurrentSessionsSameShadowBranch` (8 parallel sessions
on one shadow branch, distinct trees, StepCount + full tree-walk consistency
assertions). The test now passes cleanly under -race.
Hygiene (rebrand leaks found during entireio/cli deep comparison):
- s/entire-v2-rotations/trace-v2-rotations/
- Two integration-test ".git/entire-sessions" paths → trace-sessions
- Ported upstream `trail_cmd_test.go` (normalized import) to close the helper
coverage gap on limit/filter/parse/print trail functions.
This closes the "failed to read tree: object not found" race when multiple
agents or hook processes write the first checkpoints for sessions that share
the same base commit + worktree (the exact scenario behind the Stop-hook
failures the new test reproduces).
Refs: upstream entireio/cli 6d616b1 + preceding CAS work
Tested: go test -race -run TestSaveStep_Concurrent... ./cmd/trace/cli/strategy
go test -race ./cmd/trace/cli/checkpoint/... ./cmd/trace/cli/strategy/...
gofmt clean, golangci-lint (dupl) on changed packages clean for new code
Build: clean
The per-session state-file locking (already present via internal/flock in
acquireSessionGate) + the new per-shadow-branch flock + CAS now give us
end-to-end cross-process safety for the manual-commit strategy.
…tus filtering UX - Extend trailListOptions with ShowAll - Update default behaviour and help text for 'trace trail' - Re-order struct fields for clarity - The supporting unit tests (limitTrails, filterByAuthor, parseStatus, print helpers) were ported from upstream in the previous commit and now cover these paths. Part of the ongoing trail / review-events work.
…hadow migration Adjustments to how resume / attach flows interact with the new session state initialization and shadow branch migration logic (to stay safe under concurrent writers). Refs: the concurrent safety changes in 9a56474
…ng for v2 + trail - Comment and ref name updates for trace/checkpoints/v1 and refs/trace/ - Minor hook command adjustments to support the new trail/review event watching - Prep work for the broader trail feature on this branch
…ing under CAS - Change MutateSessionState error handling to surface non-NotFound errors - Early exit in ensureSessionInitialized when base commit already present - Minor variable naming and error wrapping cleanups for readability during concurrent shadow writes. These local tweaks complement the upstream CAS port (9a56474) and make the concurrent session path more robust.
…e response size limit - Port auth_tokens_test.go, base_url_test.go, client_test.go, repositories_test.go - Port trail_types_test.go and align TrailResource to current upstream shape (ID field + json tag) - Increase maxResponseBytes from 1 MiB → 16 MiB so large trail/checkpoint payloads (and the new large-body test) succeed. Matches upstream. All new tests now pass. Part of the ongoing full port from entireio/cli for the trail + remote checkpoint features on this branch. Refs: PR #14
test(jsonutil): port write.go + write_test.go from upstream - Adds the missing JSONL / atomic write helpers used by checkpoint condensation and trail. - All tests pass after rebrand. Part of the full entireio/cli → trace port effort (see /tmp/FULL_PORT_CHECKLIST.md). This can be committed on top of the API port (bd1b5f7) before or after the fmt/lint cleanup commit for PR #14.
Update: More upstream ports landed on this branchSince opening PR #14, we have continued the systematic port from Recently merged to
|
…from upstream + supporting settings helpers - Added ClonePreferences struct + Load/SaveClonePreferences, Load/SaveProjectRaw, LoadLocalRaw, etc. (adapted for Trace paths). - Added wrapDisplayWidth helper to tui_text.go so the ported test builds. - Fixed a couple of TrailResource.TrailID → .ID call sites that the struct alignment exposed. - All review package builds and relevant tests pass. This unblocks the review migration feature (one-shot move of review config to clone-local prefs). Part of the full entireio/cli port effort.
|
Update: Another big port batch landed on (commit 3f44295):
Review package now builds and tests pass. PR is steadily becoming a near-complete port of the current upstream while the core safety work stays intact. Next up on the checklist: redact/ improvements and more trail package test parity. Full checklist and live status in the branch. |
- Port support for user-defined custom redactions: - Inline regex rules in settings - YAML rule packs in .trace/redactors/ - New types: CustomRulesConfig, Pack, Rule, compiled rules, etc. - New functions: LoadPacks, LoadCustomRules, ApplyCustom, etc. - Includes comprehensive tests (custom_test.go + packs_test.go) All existing and new redact tests pass. This brings secret redaction capabilities up to parity with current upstream for review, trail, and checkpoint use cases. Part of the systematic one-by-one port of all remaining features. Refs: PR #14
Adds wrapDisplayWidth function to tui_text.go for text wrapping in the review TUI, ported from entireio/cli upstream.
|
Reviewed and approved. Core CAS + flock mechanism is correct — three-layer locking composes cleanly without deadlock risk. Concurrent regression test is production-grade. Two minor notes: branch name sanitization is minimal (safe in practice since shadow names are always hex), and go-git ref reads may serve stale data on CAS retry (absorbed by 16-retry budget). Merging. |
PR: Concurrent session safety (CAS + per-shadow flock) + Trail UX + supporting fixes
Summary
This PR delivers the long-requested concurrent-agent safety net for the manual-commit strategy, plus several related UX and robustness improvements on the
featurebranch.The Problem
When two or more agents (or hook processes) ran in the same worktree against the same base commit, they all hashed to the same shadow branch (
trace/<short-base>-<worktree>). The first fewSaveStep/WriteTemporarycalls raced ongitref updates, producing the classic error:This was the root cause of many "Stop hook failed" and lost-checkpoint bugs reported with parallel Claude Code / Cursor / Codex usage.
The Solution (ported + hardened from upstream)
We ported and fully wired the atomic CAS + cross-process flock mechanism that
entireio/clilanded in early 2026 (around commit 6d616b1 and preceding work):New
cmd/trace/cli/checkpoint/shadow_ref.gocasUpdateShadowBranchRefusinggit update-ref <ref> <new> <old>withShadowBranchFlockserializing all writers for a given shadow branchBoth
WriteTemporary(session checkpoints) andWriteTemporaryTask(sub-agent task checkpoints) now run inside the per-shadow flock + CAS loop.The existing per-session locking (
acquireSessionGate+internal/flock) complements the new per-shadow protection.Added a strong regression test:
TestSaveStep_ConcurrentSessionsSameShadowBranch(8 parallel sessions on one shadow branch, distinct trees, full StepCount + recursive tree-walk assertions). Passes cleanly under-race.Commits in this PR (5 logical, reviewable commits)
Testing & Verification
checkpoint/...+strategy/...(including the new 8-way concurrent test) → green, no data racesgofmt -s -lon the entire repo → cleangolangci-lint(project.golangci.yaml, dupl, revive, etc.) on the touched packages → no new issues (only pre-existing goconst/revive in unrelated test files)go test -race ./cmd/trace/cli/checkpoint/... ./cmd/trace/cli/strategy/...(multiple runs) → cleanThe branch also passes the spirit of
mise run check(the one command the project requires before every push).Relation to Upstream
Direct port of the safety mechanism introduced in
entireio/clito solve the exact same parallel-agent failure mode. We kept our local product decisions (different module name, plugin dispatch stripped, etc.) and added our own hardening + the excellent concurrent test.Upstream reference: entireio/cli@6d616b1 + preceding CAS work.
How to review
Focus areas (in recommended order):
shadow_ref.go+ the diff intemporary.go(the twoWriteTemporary*call sites now wrapped inwithShadowBranchFlock+ CAS loop).manual_commit_concurrent_test.go— this is the executable spec for the race we closed.trail_cmd.go+ the newly portedtrail_cmd_test.go.Checklist
gofmtcleangolangci-lintclean on changed code (no new issues)mise run checkrun locally by author before merge (will be done before pushing)mise run test:e2e:canary) — recommended before mergeRelated issues / context
Ready to merge once the final local
mise run checkis green.This PR makes Trace significantly more robust for real-world multi-agent workflows.