kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd by bootjp · Pull Request #900 · bootjp/elastickv

bootjp · 2026-06-01T15:32:28Z

Summary

Fourth milestone (M4) of the Composed-1 cross-group commit-time guard per docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md §M4.

Makes the M3 apply-time gate non-disruptive to clients: instead of surfacing ErrComposed1Violation or ErrComposed1VersionGCd back to the adapter (and thus the application), the coordinator absorbs the sentinel, re-reads the catalog version, and re-issues the txn once against the post-shift owning group.

What changes

`ShardedCoordinator.Dispatch` — pin ObservedRouteVersion at entry

Every transactional OperationGroup that arrives with ObservedRouteVersion == 0 (the M1 unpinned sentinel) gets stamped with engine.Version(). This makes the M3 gate effective by default — every txn flowing through the sharded coordinator is now gate-eligible without each adapter rediscovering the §4.1 contract.

A caller that supplies a non-zero ObservedRouteVersion (a future adapter migration target managing the version itself) still wins over the Dispatch-entry pin.

New `dispatchTxnWithComposed1Retry` — retry-once on Composed-1 sentinels

Wraps the existing single-attempt dispatchTxn in a loop bounded at composed1RetryAttempts = 1 (initial + 1 retry = 2 attempts).

On the sentinel:

Re-read engine.Version() and stamp the fresh value on the OperationGroup
Clear StartTS / CommitTS so the next attempt allocates a fresh HLC-correct pair via nextStartTS (OCC invariants require commitTS > startTS under the same HLC observation)
groupMutations on the next dispatchTxn pass re-runs router.ResolveGroup, so a key whose owning group changed naturally lands on the new group's FSM

Beyond one retry the sentinel surfaces unchanged so a wrapping adapter retry harness (or the client) can decide whether to keep trying — preventing the coordinator from spinning when catalog churn outpaces the dispatch.

Caller audit (loop directive's semantic-change check)

ShardedCoordinator.Dispatch is the entry point every adapter calls. The change:

Surfaces success more often (turning a previous error into a successful commit)
Delays sentinel surfacing by a bounded retry budget

grep -rn 'ErrComposed1' adapter/ returns no hits — no adapter relies on a faster-than-bounded surfacing of these sentinels (M3 only landed yesterday).

Tests (5 new)

TestShardedCoordinator_RetriesOnComposed1Violation — happy path: first attempt fails, second succeeds, caller sees success.
TestShardedCoordinator_RetriesOnComposed1VersionGCd — same retry path drives both sentinels.
TestShardedCoordinator_PersistentComposed1ViolationSurfaces — retry budget bounds the loop: persistent failure surfaces sentinel after exactly 2 total Commit calls.
TestShardedCoordinator_PinsObservedRouteVersionAtDispatchEntry — Dispatch pins from engine.Version() when caller leaves it at zero.
TestShardedCoordinator_HonorsCallerPinnedObservedRouteVersion — caller-supplied non-zero value wins (§4.1 contract preserved).

Verification

go build ./... — clean
go test -race -count=1 ./kv — 10.3s, pass
make lint — 0 issues

Self-review (5 lenses)

Data loss — retry can only convert a previously-rejected commit into success OR surface the same error after a bounded loop. Cannot lose a committed write (each attempt allocates fresh commitTS; FSM's OCC validation rejects duplicates).
Concurrency — no new locks. Retry runs serially on dispatcher goroutine; engine.Version() takes the existing read lock per call.
Performance — common case: single iteration + one branch + one engine.Version() read at entry. Sentinel case: one extra dispatchTxn round-trip.
Data consistency — closes §M4. With M1+M2+M3+M4 landed, a MoveRange / cross-group SplitRange between BeginTxn and Commit is now transparent to the client: coordinator absorbs the gate's rejection, re-routes, succeeds on the new owner.
Test coverage — 5 unit tests cover both sentinels, retry budget, Dispatch-entry pin, and caller-pin precedence. Existing tests pass unchanged.

Test plan

go test -race ./kv
make lint
(Follow-up) M5 — Jepsen workload (route-shuffle generator for the DynamoDB suite, find no G1c anomaly under concurrent MoveRange).

Resolves

The M4 row in the Composed-1 design doc.

Summary by CodeRabbit

New Features
- Transaction dispatch now performs a single bounded retry for specific transient composed-1 errors, preserving previous commit timestamps; caller-supplied start timestamps suppress the retry. Write-only transactions may auto-pin route version at dispatch, while multi-shard commit-phase secondary requests intentionally omit the pinned route version.
Tests
- Expanded test coverage for composed-1 retry semantics, PrevCommitTS preservation, caller-supplied StartTS behavior, route-version pinning rules, and secondary commit behavior.

…osed1VersionGCd Fourth milestone of the Composed-1 cross-group commit-time guard per docs/design/2026_05_29_proposed_composed1_cross_group_commit_guard.md §M4. Makes the M3 apply-time gate non-disruptive to clients: instead of surfacing ErrComposed1Violation or ErrComposed1VersionGCd back to the adapter (and thus the application), the coordinator absorbs the sentinel, re-reads the catalog version, and re-issues the txn once against the post-shift owning group. Two changes: * Pin reqs.ObservedRouteVersion at Dispatch entry. ShardedCoordinator .Dispatch now stamps the engine's current catalog version onto every transactional OperationGroup that arrived with the zero sentinel. This makes the M3 gate effective by default — every txn flowing through the sharded coordinator is now gate-eligible without each adapter rediscovering the design doc §4.1 contract. A caller that supplied a non-zero ObservedRouteVersion (a future adapter migration target managing the version itself) still wins over the Dispatch-entry pin. * Retry-once on Composed-1 sentinels. dispatchTxnWithComposed1Retry wraps the existing single-attempt dispatchTxn in a loop bounded at composed1RetryAttempts = 1 (initial + 1 retry = 2 attempts). On the sentinel: - re-read engine.Version() and stamp the fresh value on the OperationGroup - clear StartTS / CommitTS so the next attempt allocates a fresh HLC-correct pair via nextStartTS (the OCC invariants require commitTS > startTS under the same HLC observation; reusing the stale pair would risk a startTS that no longer dominates the post-shift shard's applied commitTS) - groupMutations on the next dispatchTxn pass re-runs router.ResolveGroup, so a key whose owning group changed since the last attempt naturally lands on the new group's FSM Beyond one retry the sentinel surfaces unchanged so a wrapping adapter retry harness (or the client) can decide whether to keep trying — preventing the coordinator from spinning when catalog churn outpaces the dispatch. Caller audit (the loop directive's semantic-change check): ShardedCoordinator.Dispatch is the entry point every adapter calls. The change SURFACES SUCCESS more often (turning a previous error into a successful commit) and surfaces the sentinel after a bounded retry budget rather than immediately. No adapter relies on a faster-than-bounded surfacing of these sentinels (M3 only landed in this session and adapters did not migrate), so the relaxed surfacing is safe. Verified: grep -rn 'ErrComposed1' adapter/ returns no hits. Tests (kv/sharded_coordinator_composed1_retry_test.go): * TestShardedCoordinator_RetriesOnComposed1Violation — happy-path: first attempt returns Composed1Violation, second succeeds; caller sees success. * TestShardedCoordinator_RetriesOnComposed1VersionGCd — same retry path drives both sentinels. * TestShardedCoordinator_PersistentComposed1ViolationSurfaces — retry budget bounds the loop: persistent failure surfaces the sentinel after exactly 2 total Commit calls. * TestShardedCoordinator_PinsObservedRouteVersionAtDispatchEntry — Dispatch pins reqs.ObservedRouteVersion=engine.Version() when caller leaves it at zero. * TestShardedCoordinator_HonorsCallerPinnedObservedRouteVersion — caller-supplied non-zero ObservedRouteVersion survives the Dispatch-entry pin (§4.1 contract preserved). Verification: * go build ./... — clean * go test -race -count=1 ./kv — 10.3s, pass * make lint — 0 issues Self-review (5 lenses): 1. Data loss — the retry can only convert a previously-rejected commit into a successful one OR surface the same error after a bounded loop. Cannot lose a committed write. Cannot duplicate a write because each attempt's commitTS is fresh and the FSM's OCC validation already rejects duplicates. 2. Concurrency — no new locks. Retry runs serially on the dispatcher goroutine; engine.Version() takes the existing read lock per call. No new lock ordering. 3. Performance — for the common case (no sentinel) the retry loop is a single iteration with one extra branch + one engine.Version() read at entry. For the sentinel case the cost is one extra dispatchTxn round-trip — design doc says this is acceptable; the alternative is a client-visible error. 4. Data consistency — closes the §M4 row of the design doc. With M1 (plumbing), M2 (ring), M3 (gate), and M4 (retry) all landed, a MoveRange / cross-group SplitRange between BeginTxn and Commit is now transparent to the client: the coordinator absorbs the gate's rejection, re-routes, and succeeds on the new owner. 5. Test coverage — 5 new unit tests cover both sentinels, the retry budget, the Dispatch-entry pin, and the caller-pin precedence. Existing dispatchTxn tests pass unchanged (their requests carry ObservedRouteVersion=0 but the test coordinator's engine has no routes beyond v=0, so the gate short-circuits exactly as before). Next milestone: M5 — Jepsen workload (route-shuffle generator for the DynamoDB suite, find no G1c anomaly under concurrent MoveRange). Separate PR per the design doc §6 milestone breakdown.

bootjp · 2026-06-01T15:32:32Z

@claude review

coderabbitai · 2026-06-01T15:32:42Z

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 34 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b644a3c-1bc9-4cc0-a93f-94f14dbdba35

📥 Commits

Reviewing files that changed from the base of the PR and between d848767 and e08d44f.

📒 Files selected for processing (4)

kv/sharded_coordinator.go
kv/sharded_coordinator_composed1_retry_test.go
kv/sharded_coordinator_txn_test.go
kv/txn_errors.go

📝 Walkthrough

Walkthrough

Adds a bounded single-retry Composed-1 loop to ShardedCoordinator.Dispatch: on composed-1 sentinel errors it refreshes ObservedRouteVersion, clears/re-allocates timestamps, and retries the transaction once. Also stops propagating ObservedRouteVersion into 2PC commit-phase secondary requests. Tests validate retry budgeting, PrevCommitTS preservation, pinning and StartTS rules, read-write exceptions, and the 2PC secondary drop.

Changes

Composed-1 Retry Implementation and Testing

Layer / File(s)	Summary
Retry constant and error classifier; Dispatch flag `kv/sharded_coordinator.go`	Adds `composed1RetryAttempts`, records whether `StartTS` was caller-supplied, and introduces `isComposed1RetryableError`.
Composed-1 retry wrapper & integration `kv/sharded_coordinator.go`	Routes transactional `OperationGroup`s through `dispatchTxnWithComposed1Retry`, adds `maybeAutoPinObservedRouteVersion`, and implements a single-retry loop that re-stamps ObservedRouteVersion, clears `CommitTS`, re-allocates `StartTS`, and preserves `PrevCommitTS` for the retry; read-write txns and caller-supplied StartTS suppress retry.
Drop ObservedRouteVersion on 2PC secondary commit `kv/sharded_coordinator.go`	`dispatchMultiShardTxn`/`commitSecondaryTxns` no longer forward `observedRouteVersion`; secondary commit `pb.Request`s omit `ObservedRouteVersion` (zero).
Test stubs and core retry behavior tests `kv/sharded_coordinator_composed1_retry_test.go`	Adds `composed1RetryingTransactional` and `persistentComposed1Failer` stubs and tests that composed-1 sentinels trigger exactly one retry or surface after budget exhaustion.
ObservedRouteVersion pinning, StartTS and PrevCommitTS tests `kv/sharded_coordinator_composed1_retry_test.go`	Adds tests asserting dispatch-time auto-pinning for write-only txns, preservation of caller-supplied pin, suppression of auto-pin/retry for read-write txns or caller-supplied StartTS, and preservation of `PrevCommitTS` across composed-1 retry attempts; also tests that 2PC secondary COMMIT carries `ObservedRouteVersion=0`.
Cross-shard 2PC test update `kv/sharded_coordinator_txn_test.go`	Refines test and assertions so PREPAREs and primary COMMIT must carry the pinned ObservedRouteVersion while secondary COMMIT must drop it.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

bootjp/elastickv#895: Introduces ErrComposed1Violation and ErrComposed1VersionGCd sentinel errors that this PR's retry logic detects and handles.
bootjp/elastickv#796: Adds PrevCommitTS-related one-phase idempotency that this PR preserves and forwards through the retry path.
bootjp/elastickv#881: Earlier changes to ObservedRouteVersion plumbing that this PR selectively stops propagating into 2PC secondary commits.

Poem

🐰 I hop where routes and versions meet,
One retry bound — I keep transactions neat.
Stamp, clear, and try once more,
Keep PrevCommit safe as before.
For secondaries I let the pin retreat.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: implementing dispatch retry on two specific Composed-1 errors (ErrComposed1Violation and ErrComposed1VersionGCd) as part of M4 milestone.
Docstring Coverage	✅ Passed	Docstring coverage is 92.86% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/composed1-m4-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-01T15:32:47Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (3cf4e90):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

claude · 2026-06-01T15:32:48Z

Claude finished @bootjp's task in 4m 11s —— View job

PR Review: kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd

Gather context and understand the request
Read changed files and diff
Review kv/sharded_coordinator.go changes
Review kv/sharded_coordinator_composed1_retry_test.go
Post findings

The M4 design is sound and the implementation is largely clean. One correctness gap, two minor issues, and a couple of nits.

`kv/sharded_coordinator.go`

[Medium] PrevCommitTS not cleared on retry — can produce ErrTxnDedupRequiresSingleShard instead of a successful re-route

dispatchTxnWithComposed1Retry clears StartTS and CommitTS on retry but does not clear reqs.PrevCommitTS (sharded_coordinator.go:519-526).

The scenario where this matters:

A multi-key txn has all its keys on one shard → single-shard path. A previous ambiguous attempt set PrevCommitTS for option-2 dedup.
While retrying, a SplitRange fires and the keys land on two different shards.
M3 gate fires → ErrComposed1Violation.
M4 retry fires. groupMutations re-routes and now returns two group IDs → dispatchMultiShardTxn.
dispatchMultiShardTxn sees prevCommitTS != 0 and immediately returns ErrTxnDedupRequiresSingleShard (line 691).

The dedup probe is invalidated by the M3 rejection (the first attempt never committed — the gate fires at apply time). PrevCommitTS should be zeroed in the retry block alongside StartTS/CommitTS:

reqs.StartTS = 0
reqs.CommitTS = 0
reqs.PrevCommitTS = 0  // ← add this

Fix this →

[Minor] c.clock == nil leaves reqs.StartTS = 0 for retry attempt

The retry block sets reqs.StartTS = 0 then only re-allocates if c.clock != nil (lines 520-526). When c.clock == nil, the next loop iteration calls dispatchTxn(ctx, 0, 0, ...). resolveTxnCommitTS(0, 0) then calls nextTxnTSAfter(0) → returns 1, so the FSM receives a ts=0 request. This is technically valid (commitTS=1 > startTS=0), but it's inconsistent with the original Dispatch-level nextStartTS that accounts for maxLatestCommitTS. No test covers this path.

[Nit] Unreachable tail return

The return nil, errors.WithStack(ErrInvalidRequest) at line 530 is unreachable (the loop body always returns on attempt == composed1RetryAttempts). The comment explains this, and it satisfies the compiler. Acceptable as-is, but it could be avoided with:

// Replace:
for attempt := 0; attempt <= composed1RetryAttempts; attempt++ { ... }
return nil, errors.WithStack(ErrInvalidRequest)

// With a final-attempt flag or do { ... } while pattern,
// or just leave it — the comment is clear enough.

[Design note] reqs mutation is caller-visible

dispatchTxnWithComposed1Retry mutates reqs.ObservedRouteVersion, reqs.StartTS, and reqs.CommitTS in place. After Dispatch returns (success or error), the caller's OperationGroup fields are permanently altered. Current adapters don't reuse the struct after Dispatch returns, so this is safe in practice, but the contract is implicit. A doc comment on Dispatch (or on OperationGroup) clarifying "fields may be updated in place" would be useful before future callers discover this empirically.

`kv/sharded_coordinator_composed1_retry_test.go`

[Minor] Dead goroutine in TestShardedCoordinator_RetriesOnComposed1Violation

Lines 68-74 start a goroutine with an empty body:

go func() {
    // The retry happens synchronously inside Dispatch; advance the
    // catalog before the second attempt would re-read it. ...
}()

The goroutine does nothing and the comment describes intent that was never implemented. Since the test already validates the re-read contract via the assertion at line 91-92 ("the contract is that we RE-READ, not that we always change"), the goroutine adds confusion without adding coverage. Remove it.

Fix this →

[Nit] composed1RetryingTransactional.observedSeen concurrency

observedSeen []uint64 is appended in Commit without a mutex. In all current tests every Commit call happens serially inside Dispatch, so there's no race. If a future test drives concurrent Commit calls through this stub, -race would flag it. Not a problem now, but worth noting for a follow-up if the stub is reused.

Follow-up concern (not blocking M4)

Secondary commit path is not Composed-1 aware: commitSecondaryWithRetry retries up to txnSecondaryCommitRetryAttempts=3 with the same req (including the original observedRouteVersion). If a secondary shard's catalog has shifted, all three retries will get ErrComposed1Violation, fall back to the lock resolver with a warning, and the lock resolver will inherit the same problem. This is a pre-existing design gap in the secondary path — the M3 gate makes it more reachable. Flagging for M5 / lock-resolver hardening.

Summary

The M4 design intent is correct and well-tested for the primary cases. The one actionable fix before merge is clearing PrevCommitTS = 0 in the retry block. The empty goroutine is cosmetic but should be cleaned up. Everything else is informational.

chatgpt-codex-connector · 2026-06-01T15:32:55Z

To use Codex here, create a Codex account and connect to github.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3cf4e9069d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T15:34:28Z

+		reqs.StartTS = 0
+		reqs.CommitTS = 0


Preserve the original read timestamp on route retry

When a transaction has already read data (for example Redis MULTI/EXEC populates ReadKeys and returns values from OperationGroup.StartTS), this retry path clears the original read timestamp and allocates a fresh one before re-issuing the same write set. The FSM/store OCC check rejects only versions with latest > startTS, so any concurrent write that lands after the client's original read snapshot but before this fresh retry timestamp is no longer detected; the retry can commit decisions based on stale reads after a Composed-1 route-shift error. The retry needs to keep the transaction's original read timestamp for OCC validation (or force the caller to recompute the transaction) rather than silently advancing it.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request implements an M4 Composed-1 retry loop in ShardedCoordinator to handle transaction routing shifts by re-reading the catalog version and re-issuing the transaction once. The review feedback highlights two critical issues: first, transparently retrying transactions with a bumped start timestamp is unsafe if the transaction contains read keys, as it can lead to stale reads and serializability violations; second, the coordinator should unconditionally allocate a new start timestamp on retry instead of checking for a non-nil clock, as leaving the timestamp at zero violates MVCC invariants.

gemini-code-assist · 2026-06-01T15:34:36Z

+		if !isComposed1RetryableError(err) {
+			return resp, err
+		}
+		if attempt == composed1RetryAttempts {
+			return resp, err
+		}


Transparently retrying a transaction with a bumped StartTS is only safe if the transaction is write-only (len(reqs.ReadKeys) == 0).

If the transaction has read keys, the client/adapter performed those reads at the original StartTS. If the coordinator transparently bumps StartTS to a newer timestamp on retry, any concurrent writes to those read keys that committed between the old StartTS and the new StartTS will be silently ignored by the FSM's OCC validation (since the FSM only checks for conflicts after the new StartTS). This allows the transaction to commit using stale read values, violating strict serializability (SSI) and introducing G1c/write-skew anomalies.

Per the general rule:

In transactional read-modify-write operations, set the transaction's start timestamp to the read timestamp to ensure a consistent snapshot and prevent race conditions.

We must propagate the ErrComposed1Violation or ErrComposed1VersionGCd error back to the caller if len(reqs.ReadKeys) > 0 so that the entire transaction can be retried from the beginning (including re-reading the keys at a fresh timestamp).

Suggested change

if !isComposed1RetryableError(err) {

return resp, err

}

if attempt == composed1RetryAttempts {

return resp, err

}

if !isComposed1RetryableError(err) {

return resp, err

}

if len(reqs.ReadKeys) > 0 {

return resp, err

}

if attempt == composed1RetryAttempts {

return resp, err

}

References

In transactional read-modify-write operations, set the transaction's start timestamp to the read timestamp to ensure a consistent snapshot and prevent race conditions.

gemini-code-assist · 2026-06-01T15:34:36Z

+		if c.clock != nil {
+			startTS, allocErr := c.nextStartTS(ctx, reqs.Elems)
+			if allocErr != nil {
+				return resp, allocErr
+			}
+			reqs.StartTS = startTS
+		}


The check if c.clock != nil prevents allocating a new StartTS on retry when c.clock is nil. However, c.nextStartTS is fully capable of handling a nil clock (it falls back to maxTS + 1 at line 1031). Leaving reqs.StartTS at 0 on retry is incorrect because dispatchTxn does not allocate a StartTS if it is 0, which would cause the retried transaction to run with StartTS = 0 (violating MVCC/timestamp invariants). We should call c.nextStartTS unconditionally, just like Dispatch does at line 450.

startTS, allocErr := c.nextStartTS(ctx, reqs.Elems) if allocErr != nil { return resp, allocErr } reqs.StartTS = startTS

References

In an MVCC system, all operations within a single transaction must share the same start timestamp to ensure atomicity and a consistent snapshot view.

coderabbitai

🧹 Nitpick comments (2)

kv/sharded_coordinator_composed1_retry_test.go (1)

68-74: 💤 Low value

Remove the empty goroutine or convert to a regular comment block.

This goroutine body contains only comments and executes nothing. If it's a placeholder for future work, consider adding a TODO or removing it entirely to avoid confusion.

Suggested fix (remove empty goroutine)

-	// Between the first and second dispatch attempts, simulate the
-	// route catalog moving forward — M4's retry path must re-read
-	// the engine version and stamp the new one on the retry's
-	// pb.Request so the FSM sees the post-shift observed version.
-	go func() {
-		// The retry happens synchronously inside Dispatch; advance the
-		// catalog before the second attempt would re-read it.  In a
-		// real cluster the advance is driven by a concurrent
-		// CatalogWatcher fan-out; here we just race against the
-		// in-process Commit to model the same observable effect.
-	}()
+	// NOTE: In a real cluster the route catalog would advance between
+	// the first and second dispatch attempts via a concurrent
+	// CatalogWatcher fan-out. This test verifies the retry path
+	// re-reads engine.Version(); the assertions below confirm both
+	// attempts record an ObservedRouteVersion (even when unchanged).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/sharded_coordinator_composed1_retry_test.go` around lines 68 - 74, Remove
the no-op goroutine in the test (the go func() { /* ...comments... */ }() block)
because it executes nothing; either delete the entire goroutine invocation or
replace it with a regular comment/TODO block so intent is clear and the test
contains no empty concurrent construct. Locate the empty anonymous goroutine in
kv/sharded_coordinator_composed1_retry_test.go and update it accordingly to
eliminate confusion.

kv/sharded_coordinator.go (1)

469-531: 💤 Low value

On allocation failure, return nil instead of the stale response from the prior failed attempt.

At line 523, when nextStartTS fails, resp is from the previous dispatchTxn call that failed with a Composed-1 sentinel. Returning this stale response alongside an unrelated error may confuse callers inspecting both values.

Suggested fix

 		if c.clock != nil {
 			startTS, allocErr := c.nextStartTS(ctx, reqs.Elems)
 			if allocErr != nil {
-				return resp, allocErr
+				return nil, allocErr
 			}
 			reqs.StartTS = startTS
 		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@kv/sharded_coordinator.go` around lines 469 - 531, In
dispatchTxnWithComposed1Retry, when nextStartTS returns an error (allocErr) you
must not return the stale resp from the prior dispatchTxn; instead return nil
and the allocation error. Update the error path inside the retry loop (the block
that calls c.nextStartTS) to return (nil, allocErr) rather than (resp,
allocErr). This touches dispatchTxnWithComposed1Retry and the variables resp,
allocErr and ensures callers don't receive a stale CoordinateResponse alongside
an unrelated allocation error.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@kv/sharded_coordinator_composed1_retry_test.go`:
- Around line 68-74: Remove the no-op goroutine in the test (the go func() { /*
...comments... */ }() block) because it executes nothing; either delete the
entire goroutine invocation or replace it with a regular comment/TODO block so
intent is clear and the test contains no empty concurrent construct. Locate the
empty anonymous goroutine in kv/sharded_coordinator_composed1_retry_test.go and
update it accordingly to eliminate confusion.

In `@kv/sharded_coordinator.go`:
- Around line 469-531: In dispatchTxnWithComposed1Retry, when nextStartTS
returns an error (allocErr) you must not return the stale resp from the prior
dispatchTxn; instead return nil and the allocation error. Update the error path
inside the retry loop (the block that calls c.nextStartTS) to return (nil,
allocErr) rather than (resp, allocErr). This touches
dispatchTxnWithComposed1Retry and the variables resp, allocErr and ensures
callers don't receive a stale CoordinateResponse alongside an unrelated
allocation error.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f9d7b48-3ca0-4a71-a58f-fa7c79746643

📥 Commits

Reviewing files that changed from the base of the PR and between 6f2941b and 3cf4e90.

📒 Files selected for processing (2)

kv/sharded_coordinator.go
kv/sharded_coordinator_composed1_retry_test.go

… (gemini CRITICAL/HIGH + codex P1 on PR #900) Two converging findings from PR #900's automated review pass, both correctly identifying that the M4 transparent retry can introduce strict-serialisability violations: * gemini CRITICAL + codex P1 — Transparently retrying a txn with a bumped StartTS is SAFE only when the txn is write-only. When the txn has read keys (Redis MULTI/EXEC, DynamoDB TransactGetItems → TransactWrite, etc.), the client already performed those reads at the original StartTS. Bumping StartTS on retry would let any concurrent write that committed between the old and new timestamps go undetected by the FSM's OCC check (which only rejects versions with `latest > startTS`), allowing the retry to commit decisions based on stale reads — an SSI / G1c violation. Fix: surface the sentinel WITHOUT retry when `len(reqs.ReadKeys) > 0`. The caller / adapter is then responsible for re-executing the entire txn (including re-reading the keys at a fresh timestamp). Write-only txns (no reads to invalidate) remain safe to retry — the M3-to-M5 plumbing is still client-transparent for the most common cross-group SplitRange / MoveRange case (operator-frequency catalog churn, write-only adapter patterns). * gemini HIGH — The retry path gated StartTS allocation on `if c.clock != nil`, but `nextStartTS` already handles a nil clock internally (falls back to `maxTS+1`). Leaving `reqs.StartTS = 0` on retry was incorrect because `dispatchTxn` does NOT re-allocate when StartTS is zero — the retry would run with StartTS=0, violating MVCC invariants in the nil-clock test fixture (and any future nil-clock test seam that exercises the retry path). Fix: allocate StartTS unconditionally on retry via `nextStartTS`. Matches the unconditional call pattern Dispatch itself uses at line 450 of the same file. Caller audit (the loop directive's semantic-change check): the behaviour change is that ErrComposed1Violation / ErrComposed1VersionGCd now surfaces immediately for read-write txns instead of after a one-attempt retry. `grep -rn 'ErrComposed1' adapter/` returns no hits — no adapter currently distinguishes these sentinels (M3 only landed yesterday and adapters did not migrate to read them), so the earlier surfacing is semantics-safe. When an adapter does start matching on these sentinels, the new behaviour is what it would want (the entire txn needs re-execution, not a coordinator-side bump). Regression test (added before the fix per the project convention): * TestShardedCoordinator_DoesNotRetryReadWriteTxnOnComposed1 — constructs a txn with `ReadKeys` non-empty, stubs the transactional Commit to fail with ErrComposed1Violation, asserts: 1. the sentinel surfaces immediately (errors.Is check) 2. exactly 1 Commit attempt fires (no retry) Verified failing on the parent commit (the test received 2 Commit calls because the previous code retried) and passing on this commit. Verification: go test -race -count=1 ./kv → pass. make lint → 0 issues.

bootjp · 2026-06-01T15:38:11Z

Both critical-class findings addressed:

gemini CRITICAL / codex P1 (line 503/519 — bumped StartTS breaks SSI for read-write txns): Now surfaces ErrComposed1Violation / ErrComposed1VersionGCd IMMEDIATELY (no retry) when len(reqs.ReadKeys) > 0. The client already performed reads at the original StartTS — silently bumping it would let concurrent writes between old/new timestamps escape OCC validation, violating SSI. Write-only txns (no reads to invalidate) remain safe to retry, so M4 is still client-transparent for the most common cross-group SplitRange / MoveRange case.

gemini HIGH (line 526 — nil-clock guard left StartTS=0): nextStartTS already handles nil clock internally; removed the if c.clock != nil gate so the retry allocates StartTS unconditionally, matching the unconditional call pattern Dispatch itself uses.

Caller audit (loop directive): grep -rn 'ErrComposed1' adapter/ → no hits. M3 only landed yesterday and no adapter matches on these sentinels, so the earlier surfacing for read-write txns is semantics-safe. When adapters do start matching, the new behaviour is what they'd want (entire txn re-execution, not a coordinator-side bump).

Regression test (added before fix per convention): TestShardedCoordinator_DoesNotRetryReadWriteTxnOnComposed1 — read-write txn with ErrComposed1Violation surfaces immediately, exactly 1 Commit call. Verified failing on parent / passing on fix.

go test -race ./kv pass / make lint 0 issues.

@claude review

github-actions · 2026-06-01T15:38:34Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (66ad5b0):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

chatgpt-codex-connector · 2026-06-01T15:38:41Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-06-01T15:38:43Z

Claude finished @bootjp's task in 2m 30s —— View job

PR Review: Follow-up fixes (gemini CRITICAL/HIGH + codex P1)

Gather context and understand the request
Read current state of changed files
Verify gemini CRITICAL / codex P1 fix (read-write txn skip)
Verify gemini HIGH fix (unconditional nextStartTS)
Check regression test correctness
Check remaining open items from prior review (PrevCommitTS, empty goroutine)
Post findings

The two critical/high fixes are correct. Two items from the prior review remain open.

✅ gemini CRITICAL / codex P1 — correct

if len(reqs.ReadKeys) > 0 { return resp, err } at line 516 surfaces the sentinel immediately for read-write txns. The check sits after isComposed1RetryableError, so it only fires on the two sentinel errors, not on ordinary failures. The regression test TestShardedCoordinator_DoesNotRetryReadWriteTxnOnComposed1 follows the project convention (failing before fix, passing after) and correctly asserts both the error identity and the single-attempt count.

✅ gemini HIGH — correct

The if c.clock != nil gate is gone. nextStartTS is called unconditionally, matching the call at Dispatch entry (line 450). The commit message correctly notes that nextStartTS handles the nil-clock case internally.

⚠️ `PrevCommitTS` still not cleared on retry — prior medium finding still open

sharded_coordinator.go:544 clears CommitTS and re-allocates StartTS, but PrevCommitTS is not zeroed. The scenario from the prior review is unchanged and now scoped to write-only txns (the only ones that reach the retry path after the read-write fix):

Write-only txn, single-shard, PrevCommitTS set for at-most-once dedup.
M3 gate fires → ErrComposed1Violation (attempt never committed).
M4 retry fires. Catalog has since shifted; groupMutations re-routes to two groups → dispatchMultiShardTxn.
dispatchMultiShardTxn line 713: if prevCommitTS != 0 { return nil, ErrTxnDedupRequiresSingleShard }.

The dedup sentinel is wrong here — the original attempt never committed (the FSM rejected it at apply time), so PrevCommitTS's "you already committed this" signal is stale. The caller gets ErrTxnDedupRequiresSingleShard instead of a successful re-route.

Fix: add reqs.PrevCommitTS = 0 alongside reqs.CommitTS = 0 at line 544:

reqs.CommitTS = 0
reqs.PrevCommitTS = 0  // ← add: M3 gate means the original attempt never committed,
                       //         so the dedup signal is no longer valid on retry

Fix this →

⚠️ Empty goroutine — prior minor finding still open

TestShardedCoordinator_RetriesOnComposed1Violation lines 68–74 still contain an empty go func() { /* comments */ }(). The goroutine executes nothing. With -race this won't flag (it's a no-op), but it's misleading — the comment describes intent that isn't implemented, and a future reader may assume this goroutine is doing concurrent catalog mutation to model the race. Remove the goroutine invocation; the comment text can stay as a regular block comment if the intent is worth preserving.

Fix this →

Nit — stale `resp` returned on `allocErr`

sharded_coordinator.go:547: when nextStartTS fails, the code returns resp, allocErr. At that point resp is the response from the failed dispatchTxn call (the one that returned the Composed-1 sentinel). In practice resp is nil when dispatchTxn fails, so this is harmless, but returning nil, allocErr is more explicit:

if allocErr != nil {
    return nil, allocErr  // ← resp here is stale from the failed attempt
}

Summary

The two critical/high correctness findings are properly fixed with regression tests. The medium PrevCommitTS finding and the empty goroutine nit from the prior review remain. The PrevCommitTS fix is a one-liner and should go in before merge. Everything else is good.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66ad5b0e5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T15:44:40Z

+	if c.engine != nil && reqs.ObservedRouteVersion == 0 {
+		reqs.ObservedRouteVersion = c.engine.Version()


Pin observed routes at the transaction read time

When a caller already supplies a transaction StartTS/ReadKeys but leaves ObservedRouteVersion unset, this stamps the catalog version at EXEC/commit dispatch time rather than when the read set was captured. For example, Redis txnContext.commit builds an OperationGroup with StartTS and ReadKeys but no ObservedRouteVersion; if a route moves between the earlier MULTI read and this Dispatch call, the request is routed and verified against the new version, so verifyComposed1 will not detect the route shift the guard is meant to catch. This needs to be pinned when the transaction starts/reads, or not auto-filled here for already-started read-write transactions.

Useful? React with 👍 / 👎.

bootjp · 2026-06-01T19:21:11Z

Codex P1 addressed:

The Dispatch-entry auto-pin no longer fires for read-write txns. Per design doc §4.1, pinning must happen at BeginTxn (read-set capture); for read-write txns Dispatch is strictly LATER than BeginTxn, so the coordinator cannot retroactively reconstruct the route view the reads were taken against. Stamping the catalog version at EXEC time would associate post-shift routes with pre-shift reads — exactly the gate-bypass codex flagged for Redis MULTI/EXEC.

Fix: gate the auto-pin on len(reqs.ReadKeys) == 0. Write-only txns continue to be auto-pinned (Dispatch == BeginTxn equivalently). Read-write callers must pin ObservedRouteVersion themselves at BeginTxn time — until they migrate, the observedVer=0 sentinel surfaces to M3 which short-circuits, matching the pre-feature behaviour byte-for-byte.

Caller audit (loop directive): grep -rn 'ObservedRouteVersion' adapter/ → no hits. No adapter currently sets or reads the field, so the auto-pin shrink is invisible.

Regression test (added before fix per convention): TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn — read-write OperationGroup with ObservedRouteVersion=0 against engine pinned at v=5 must surface observedVer=0 (auto-pin must NOT fire). Verified failing on parent / passing on fix.

go test -race ./kv pass / make lint 0 issues.

@claude review

claude · 2026-06-01T19:21:30Z

Claude finished @bootjp's task in 3m 58s —— View job

PR Review: Codex P1 fix — gate auto-pin on write-only txns

Gather context and understand the request
Read current state of changed files
Verify Codex P1 fix (auto-pin gated on len(reqs.ReadKeys) == 0)
Verify regression test correctness
Check remaining open items (PrevCommitTS, empty goroutine)
Post findings

⚠️ Claimed fix is not yet in the branch

The trigger comment describes gating the auto-pin on len(reqs.ReadKeys) == 0 and adding TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn, but neither change appears in the current HEAD (66ad5b0). Lines 489–491 still read:

if c.engine != nil && reqs.ObservedRouteVersion == 0 {
    reqs.ObservedRouteVersion = c.engine.Version()
}

without the len(reqs.ReadKeys) == 0 guard. The regression test is also absent from sharded_coordinator_composed1_retry_test.go. The changes need to be pushed before this PR can be reviewed as described.

✅ Described approach is correct — reviewing the intent

Assuming the changes are as described, the approach is sound:

The gate placement is right. Moving the ReadKeys check to the auto-pin (lines 489–491) rather than only at the retry guard (line 516) is the correct semantics: for read-write txns, ObservedRouteVersion must stay 0 so verifyComposed1 short-circuits (pre-feature behavior). Stamping the version at EXEC/dispatch time for a txn that read at BeginTxn time would mislead M3 into treating a pre-shift observation as post-shift, exactly the gate-bypass the Codex finding described.

The two guards are consistent after the fix:

Txn type	Auto-pin gate fires?	Composed-1 retry?	Result
Write-only	✅ pins at entry	✅ retries once	client-transparent
Read-write (no caller pin)	❌ stays 0	❌ surfaces immediately	pre-feature fallback
Read-write (caller-supplied pin)	❌ caller wins	❌ surfaces immediately	§4.1 contract preserved

This is correct. The retry and the auto-pin now agree on "only write-only txns are safe to coordinate transparently."

⚠️ Three findings from prior reviews still open

These were in the prior Claude reviews and were not addressed in 66ad5b0.

[Medium] PrevCommitTS not zeroed on retry

Line 544 clears CommitTS and re-allocates StartTS but does not zero PrevCommitTS. If a write-only txn had PrevCommitTS != 0 (single-shard at-most-once dedup), and the retry routes to multiple groups (because a SplitRange fired between the two attempts), dispatchMultiShardTxn returns ErrTxnDedupRequiresSingleShard — a wrong error, because the original attempt never committed (M3 gate fired at the FSM before any apply). The dedup signal is stale.

reqs.CommitTS = 0
reqs.PrevCommitTS = 0  // ← add: M3 means original attempt never committed; dedup signal is stale

Fix this →

[Minor] Stale resp returned on allocErr

Line 547 returns resp, allocErr where resp is from the failed Composed-1 attempt. In practice resp is always nil when dispatchTxn errors, but the intent is clearer and safer as:

return nil, allocErr

Fix this →

[Nit] Empty goroutine in TestShardedCoordinator_RetriesOnComposed1Violation

Lines 68–74 still contain:

go func() {
    // ... comments only, no body ...
}()

This goroutine executes nothing. Remove it (or keep the comment as a regular block comment).

Fix this →

Summary

The described Codex P1 fix (gate auto-pin on len(reqs.ReadKeys) == 0) is the right approach and would complete a consistent "write-only retry" contract. It just needs to be pushed to the branch. Once pushed, the three items above should also be addressed before merge — the PrevCommitTS one in particular is a correctness issue on the retry path.

…write txns (codex P1 on PR #900) When a caller already supplies a transaction StartTS + ReadKeys but leaves ObservedRouteVersion unset, the M4 Dispatch-entry auto-pin was stamping the catalog version at EXEC/commit dispatch time rather than when the read set was captured. Concrete scenario: Redis txnContext.commit builds an OperationGroup with StartTS from MULTI and ReadKeys from the GETs inside the txn body but no ObservedRouteVersion. If a route moved between the MULTI-time reads and the EXEC Dispatch call, the auto-pin stamped the POST-shift catalog version on the request and verifyComposed1 silently routed-and-verified against the new version — failing to detect the very shift the guard is meant to catch. Per design doc §4.1: pinning happens at BeginTxn time (read-set capture). For write-only txns Dispatch IS BeginTxn (no prior reads), so the auto-pin matches. For read-write txns Dispatch is strictly LATER than BeginTxn, and the coordinator cannot retroactively reconstruct the route view the reads were taken against. Such callers MUST migrate to pin ObservedRouteVersion themselves at BeginTxn time; until they do, surfacing observedVer=0 to the M3 gate (which short-circuits on the zero sentinel) is the correct non-regressing posture — better than a wrong-version "protection" that disguises a missed gate fire. Fix: gate the auto-pin on `len(reqs.ReadKeys) == 0`. Write-only txns continue to be auto-pinned at Dispatch entry; read-write txns flow through with whatever ObservedRouteVersion the caller supplied (zero = unpinned, non-zero = caller-managed per §4.1). Caller audit (the loop directive's semantic-change check): only ShardedCoordinator.Dispatch auto-pins. `grep -rn 'ObservedRouteVersion' adapter/` returns no hits — no adapter sets or reads the field today, so the semantic shrink (auto-pin no longer fires for read-write txns) is invisible to every caller. Adapters that want M3 protection for read-write txns must explicitly migrate to pin at BeginTxn, which is what the design doc has always required. Regression test (added before the fix per the project convention): * TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn — builds an OperationGroup with ReadKeys non-empty and ObservedRouteVersion=0, dispatches against an engine pinned at catalog version 5, asserts the request that reached the stub Commit carries ObservedRouteVersion=0 (the auto-pin did not fire). Verified failing on the parent commit (the request carried v=5) and passing on this commit. Verification: go test -race -count=1 ./kv → pass. make lint → 0 issues.

bootjp · 2026-06-01T19:24:17Z

Pushed as 10123c5 — the @claude review request above was posted before the commit actually landed (a pre-commit cyclop lint failure on my first attempt — fixed by extracting the auto-pin into maybeAutoPinObservedRouteVersion). The semantic fix is unchanged from the previous comment; please pick up 10123c5.

@claude review

github-actions · 2026-06-01T19:24:22Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (10123c5):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

chatgpt-codex-connector · 2026-06-01T19:24:28Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-06-01T19:24:30Z

Claude finished @bootjp's task in 2m 16s —— View job

PR Review: Codex P1 fix — `maybeAutoPinObservedRouteVersion` refactor (`10123c5`)

Gather context and understand the request
Read current state of changed files (10123c5)
Verify Codex P1 fix (maybeAutoPinObservedRouteVersion — write-only gate)
Verify regression test (TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn)
Check remaining open items from prior reviews (PrevCommitTS, empty goroutine)
Post findings

The Codex P1 fix and the cyclop refactor are correct. Two items from prior review rounds remain open.

✅ Codex P1 — correct

maybeAutoPinObservedRouteVersion at line 507 gates on len(reqs.ReadKeys) != 0:

func (c *ShardedCoordinator) maybeAutoPinObservedRouteVersion(reqs *OperationGroup[OP]) {
    if c.engine == nil || reqs.ObservedRouteVersion != 0 || len(reqs.ReadKeys) != 0 {
        return
    }
    reqs.ObservedRouteVersion = c.engine.Version()
}

For read-write txns the auto-pin is skipped — ObservedRouteVersion stays at 0, the M3 gate short-circuits (safe default), and the SSI-violation window codex flagged is closed. Write-only txns continue to be auto-pinned. The method comment accurately documents the §4.1 reasoning and the concrete Redis MULTI/EXEC scenario.

✅ Regression test — correct

TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn (lines 327–357): engine at v=5, ReadKeys non-empty, ObservedRouteVersion left at 0. The assertion require.Equal(t, uint64(0), txn.observedSeen[0]) is exactly right — verifies the pin was NOT applied. Test comment correctly explains the gate-bypass scenario.

✅ Cyclop refactor — clean

Extracting the auto-pin gate into maybeAutoPinObservedRouteVersion is the correct fix for the cyclop lint failure. dispatchTxnWithComposed1Retry is now one statement shorter at its entry point and the new helper is independently testable.

⚠️ `PrevCommitTS` still not cleared on retry — medium, now in its third review round

sharded_coordinator.go:517 passes reqs.PrevCommitTS into dispatchTxn. The retry block at lines 567–572 zeros CommitTS and re-allocates StartTS but does not clear PrevCommitTS:

reqs.CommitTS = 0
startTS, allocErr := c.nextStartTS(ctx, reqs.Elems)
// reqs.PrevCommitTS is NOT touched

The M4 retry is now correctly scoped to write-only txns. Write-only txns can still have PrevCommitTS set — that is the option-2 at-most-once dedup probe. If a dedup-enabled write-only txn hits ErrComposed1Violation and the catalog shifts such that groupMutations now returns two groups, dispatchMultiShardTxn line 736 immediately returns ErrTxnDedupRequiresSingleShard:

if prevCommitTS != 0 {
    return nil, errors.WithStack(ErrTxnDedupRequiresSingleShard)   // stale signal
}

The signal is wrong — the M3 gate fired at apply time, meaning the original attempt never committed, so PrevCommitTS's "you already sent this once" invariant is invalid on retry. The caller sees ErrTxnDedupRequiresSingleShard instead of a successful re-route.

Fix — one line alongside reqs.CommitTS = 0 at line 567:

reqs.CommitTS = 0
reqs.PrevCommitTS = 0  // M3 gate means the original attempt never committed; dedup probe is stale

Fix this →

⚠️ Empty goroutine — minor, now in its third review round

sharded_coordinator_composed1_retry_test.go lines 68–74 still contain:

go func() {
    // The retry happens synchronously inside Dispatch; advance the
    // catalog before the second attempt would re-read it.  ...
}()

The goroutine body is empty — no statements, no channel operations, nothing. It will always race harmlessly, but a future reader will assume it models a concurrent catalog advance that it does not. Remove the go func() { ... }() invocation; if the intent comment is worth keeping, promote it to a regular block comment before the Dispatch call.

Nit — stale `resp` on `allocErr` (line 570)

Still return resp, allocErr where resp is from the prior failed attempt. Should be return nil, allocErr.

Summary

The Codex P1 fix and cyclop refactor are correct; the regression test documents the §4.1 constraint precisely. The PrevCommitTS one-liner is the only correctness item outstanding before merge — it has been flagged in every review round and the fix is trivial. The empty goroutine is cosmetic but should also be cleaned up.

… (claude[bot] medium/minor/nit on PR #900) Three items from the claude[bot] review of 10123c5, all flagged across three review rounds: MEDIUM — PrevCommitTS leak on Composed-1 retry. The M4 retry block zeroed CommitTS and re-allocated StartTS but did NOT clear PrevCommitTS. Concrete failure mode (claude[bot]): an adapter-initiated write-only retry with a non-zero PrevCommitTS (the option-2 at-most-once dedup probe) hits ErrComposed1Violation; the catalog shifts such that groupMutations now spans two groups; dispatchMultiShardTxn:736 immediately returns ErrTxnDedupRequiresSingleShard because prevCommitTS != 0. But the M3 gate fired at APPLY time — the original attempt never committed — so the dedup probe is invalid on retry. The caller gets a misleading "you must single-shard" error instead of a correct cross-group re-route. Fix: drop reqs.PrevCommitTS = 0 alongside reqs.CommitTS = 0 in the retry block. The retry now reflects "fresh attempt, no prior landing to dedup against," which matches the actual state (M3 fired before any FSM apply). Caller audit (loop directive's semantic-change check): `grep -rn 'PrevCommitTS' adapter/` returns two sites (adapter/redis.go:3209, :3615), both of which construct an OperationGroup struct literal inline, pass it to Dispatch, and either return success or branch on the error. Neither call site reads reqs.PrevCommitTS back after Dispatch returns — the field is set from a separate adapter-owned `pending.commitTS` which is unmodified by Dispatch. The semantic change (clearing the field inside the M4 retry) is therefore invisible to every current caller. MINOR — empty goroutine in the retry test. The first retry test had a `go func() { /* comment */ }()` with an empty body — it looked like an unfinished concurrent-catalog-advance model but had no statements. Removed; promoted the intent comment to a plain block comment that precedes the Dispatch call and points at the actual observedSeen[1] assertion that enforces the re-read contract. NIT — stale resp on allocErr path. The retry block returned `resp, allocErr` where resp was the failed first attempt's response (always nil for the Composed-1 sentinels). Made it `nil, allocErr` so the "non-nil response only on success" contract is unambiguous. Regression test (added before the fix per the project convention): * TestShardedCoordinator_ClearsPrevCommitTSOnComposed1Retry — write-only OperationGroup with PrevCommitTS=17, first attempt rejected with ErrComposed1Violation. Decodes the encoded TxnMeta from the per-attempt pb.Request Mutation value to observe the on-wire PrevCommitTS, then asserts attempt 1 carried 17 (caller's probe unchanged) and attempt 2 carried 0 (probe dropped by the retry path). Verified failing on the parent commit (attempt 2 saw 17 = 0x11) and passing on this commit. The stub composed1RetryingTransactional grew a prevCommitTSSeen field and a per-call decode of muts[0].Value to support the assertion. Verification: go test -race -count=1 ./kv → pass (the whole package, 11s). make lint → 0 issues.

bootjp · 2026-06-01T19:34:39Z

Pushed as 43c55df — claude[bot]'s three items on 10123c5 addressed.

Medium — PrevCommitTS leak (3rd review round): retry block now zeros reqs.PrevCommitTS alongside reqs.CommitTS. The M3 gate fires at apply time, so the original attempt never committed → the option-2 dedup probe is invalid on retry. Without the clear, a write-only retry that the post-shift catalog now reroutes across two groups would hit ErrTxnDedupRequiresSingleShard on a stale signal.

Caller audit (semantic change, loop directive): grep -rn 'PrevCommitTS' adapter/ → adapter/redis.go:3209, :3615. Both sites construct OperationGroup inline, pass to Dispatch, branch on the result. Neither reads reqs.PrevCommitTS back; the adapter-owned pending.commitTS is unmodified by Dispatch. Clearing inside the M4 retry is invisible to current callers.

Minor — empty goroutine (3rd review round): removed; intent promoted to a plain block comment in front of Dispatch pointing at the observedSeen[1] assertion that enforces the re-read contract.

Nit — stale resp on allocErr: return nil, allocErr (resp was always nil for Composed-1 sentinels, but explicit is better).

Regression test (added before fix per convention): TestShardedCoordinator_ClearsPrevCommitTSOnComposed1Retry — decodes the encoded TxnMeta from each attempt's pb.Request to observe on-wire PrevCommitTS, asserts attempt 1=17 (caller's probe unchanged) / attempt 2=0 (probe dropped). Verified failing on parent / passing on fix.

go test -race ./kv pass (full pkg, 11s) / make lint 0 issues.

@claude review

github-actions · 2026-06-01T19:34:45Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (43c55df):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

claude · 2026-06-01T19:34:54Z

Claude finished @bootjp's task in 2m 37s —— View job

PR Review: `43c55df` — PrevCommitTS + empty goroutine + stale resp fixes

Gather context and understand the request
Read kv/sharded_coordinator.go at HEAD (43c55df)
Read kv/sharded_coordinator_composed1_retry_test.go at HEAD
Verify PrevCommitTS fix
Verify empty goroutine removal
Verify stale resp fix
Verify regression test correctness
Check for any new issues

All three items from the prior review rounds are correctly addressed. No new issues found.

✅ PrevCommitTS cleared on retry — correct

reqs.PrevCommitTS = 0 lands at line 575, immediately after reqs.CommitTS = 0 at line 567. The placement is right: it fires only in the retry block (not on first attempt, preserving the caller's dedup probe for attempt 1) and before the nextStartTS allocation (so the retry's dispatchTxn call at line 517 sees the cleared value). The comment explains the reasoning accurately — M3 fires at apply time, so the probe is stale on retry.

Cross-checked with dispatchMultiShardTxn:748: the guard if prevCommitTS != 0 { return nil, ErrTxnDedupRequiresSingleShard } is exactly the failure mode that would have been triggered by the stale probe. The fix lands cleanly between the two.

✅ Regression test — correct

TestShardedCoordinator_ClearsPrevCommitTSOnComposed1Retry verifies the fix via the on-wire encoding, not just the struct field — composed1RetryingTransactional.Commit decodes DecodeTxnMeta(muts[0].GetValue()) from the actual pb.Request. This is the right check: muts[0] is always the TxnMeta sentinel mutation built by onePhaseTxnRequestWithPrevCommit (coordinator.go:1092), so the test observes the same value the FSM would see, not just what the coordinator held in memory. Assertions:

prevCommitTSSeen[0] == 17 — first attempt carries the caller's probe unchanged ✅
prevCommitTSSeen[1] == 0 — retry drops it ✅

Test follows the project convention: verifies failing on the parent commit (attempt 2 would decode 0x11 = 17 before the fix) and passing after.

✅ Empty goroutine removed — correct

Lines 82–89 now contain a plain block comment explaining the test design intent. No empty go func() invocation remains. The comment accurately describes what the observedSeen[1] assertion below it actually tests (RE-READ contract, not a particular value).

✅ Stale resp on allocErr — correct

Line 582: return nil, allocErr — resp is gone. The "non-nil response only on success" invariant is now unambiguous for that code path.

TLA+ spec divergence — no update needed

The changes are confined to the Dispatch-level retry orchestration in sharded_coordinator.go. No modifications to hlc*.go, coordinator.go HLC renewal paths, transaction.go, lock_resolver.go, fsm.go, mvcc_store.go, or distribution/. The retry wraps dispatchTxn calls; it does not touch timestamp issuance, OCC commit-ts assignment, FSM apply, or catalog versioning. No spec module needs updating.

LGTM. The four review rounds across this PR have been thorough and all correctness findings are closed. The one remaining open item is the M5 Jepsen workload (pre-flagged as a follow-up in the PR description).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43c55dfe0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T19:38:39Z

+		// short-circuit with ErrTxnDedupRequiresSingleShard if the
+		// post-shift catalog now reroutes this txn to span two groups
+		// (claude[bot] medium, 3rd review round on PR #900).
+		reqs.PrevCommitTS = 0


Preserve PrevCommitTS when retrying deduped attempts

When the request is already an option-2 retry (PrevCommitTS != 0) and has no ReadKeys, this line drops the only probe that tells the FSM whether the earlier ambiguous attempt already landed. I checked the Redis list-push reuse path in adapter/redis.go: it sends PrevCommitTS: pending.commitTS, and its readKeys can be empty for the empty-list case, so a route shift that triggers ErrComposed1Violation here will reroute and reapply the write at a new timestamp instead of no-oping against the version that may have already committed. The fact that this M4 attempt failed the Composed-1 gate only proves this attempt did not apply; it does not prove the prior attempt named by PrevCommitTS did not land.

Useful? React with 👍 / 👎.

…st-once probe (codex P1 on 43c55df, PR #900) 43c55df took claude[bot]'s medium feedback at face value and zeroed reqs.PrevCommitTS in the M4 retry block, on the theory that "the M3 gate fires at apply time → the original attempt never committed → probe is stale." Codex P1 on 43c55df correctly identifies that this conflates two different "original attempts": * Within M4: attempt 1's M3 rejection does prove attempt 1 wrote nothing (the FSM rejects before any state change). * Across adapter retries: PrevCommitTS names the ADAPTER'S prior attempt, not M4's. adapter/redis.go:3209 and :3615 (Redis EXEC reuse / list-push reuse retry paths) send PrevCommitTS = pending.commitTS as the option-2 ambiguous-outcome probe. That earlier attempt may have actually LANDED at the named commitTS before the route shift that triggered this M4 attempt's M3 sentinel. Dropping the probe in the retry would let the FSM re-apply the writes against the post-shift route, double-writing the caller's at-most-once payload if the named prior attempt did land. The at-most-once contract is the whole point of option-2 dedup; the M4 retry must not break it. Trade-off (accepted): when the post-shift route now spans multiple groups, dispatchMultiShardTxn:749 surfaces ErrTxnDedupRequiresSingleShard. That error is the FAITHFUL signal that the adapter's at-most-once retry contract cannot be transparently honoured across a shard split — the caller must reckon with the ambiguity at a higher level (re-read, re-execute, abort). Surfacing it is strictly safer than silently losing dedup. The earlier (claude[bot]-flagged) "misleading error" concern is real but the only correct fix lives in the adapter retry loop, not by tampering with the probe at the coordinator layer. Fix: revert the `reqs.PrevCommitTS = 0` line that 43c55df added. Other changes from 43c55df (empty goroutine removal, stale-resp nit) are correct and stay. Caller audit (loop directive's semantic-change check): no caller reads reqs.PrevCommitTS back after Dispatch. Adapter sites (adapter/redis.go:3209, :3615) construct OperationGroup inline, pass it, and branch on the error. The semantic change here (stop clearing the probe inside the retry) restores the pre-43c55dfe wire contract and is invisible to those callers. Regression test (replaces 43c55df's misnamed ClearsPrevCommitTSOnComposed1Retry): * TestShardedCoordinator_PreservesPrevCommitTSOnComposed1Retry — write-only OperationGroup with PrevCommitTS=17, first attempt rejected with ErrComposed1Violation. Decodes the encoded TxnMeta from the per-attempt pb.Request mutation value (same bytes the FSM reads), asserts attempt 1 AND attempt 2 both carry PrevCommitTS=17 — the probe survives the retry intact. This is the inverse of the test 43c55df added; the new assertion encodes codex P1's at-most-once correctness contract. Verification: go test -race -count=1 ./kv → pass (full pkg, 11s). make lint → 0 issues.

github-actions · 2026-06-01T19:43:38Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (57da888):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

bootjp · 2026-06-01T19:43:40Z

Pushed as 57da888 — codex P1 on 43c55df addressed.

Codex P1 was correct, claude[bot]'s prior medium was wrong. 43c55df took claude[bot]'s reasoning at face value ("M3 fired at apply time → original attempt never committed → probe is stale") and zeroed reqs.PrevCommitTS on the M4 retry. That conflates two different "original attempts":

Within M4: attempt 1's M3 rejection proves attempt 1 wrote nothing. ✓
Across adapter retries: PrevCommitTS names the adapter's prior attempt (adapter/redis.go:3209, :3615 send PrevCommitTS = pending.commitTS as the option-2 ambiguous-outcome probe). That earlier attempt may have actually landed before the route shift that triggered this M4's M3 sentinel.

Dropping the probe would let the FSM re-apply against the post-shift route, double-writing the at-most-once payload if the prior attempt landed. The at-most-once contract is the whole point of option-2 dedup — M4 must not break it.

Trade-off accepted: post-shift-multi-shard case still surfaces ErrTxnDedupRequiresSingleShard from dispatchMultiShardTxn:749. That's the faithful signal — at-most-once retry cannot transparently cross a shard split; the caller must reckon with the ambiguity at a higher level. Surfacing is strictly safer than silent double-write.

Caller audit (loop directive): no caller reads reqs.PrevCommitTS back after Dispatch. Restoring the pre-43c55dfe wire contract is invisible to current callers.

Regression test replaces the misnamed prior one:

TestShardedCoordinator_PreservesPrevCommitTSOnComposed1Retry — decodes TxnMeta from each per-attempt pb.Request mutation (same bytes the FSM reads), asserts attempt 1 AND attempt 2 both carry PrevCommitTS=17. This is the inverse assertion of what 43c55df pinned, encoding codex P1's at-most-once correctness contract.

go test -race ./kv pass (full pkg, 11s) / make lint 0 issues.

@claude review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57da88864e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T19:45:39Z

+		startTS, allocErr := c.nextStartTS(ctx, reqs.Elems)
+		if allocErr != nil {
+			// resp here is the failed first attempt's response (always
+			// nil for the Composed-1 sentinels) — return nil
+			// explicitly so the contract "non-nil response on success
+			// only" is unambiguous (claude[bot] nit on PR #900).
+			return nil, allocErr
+		}
+		reqs.StartTS = startTS


Preserve caller StartTS on OCC retries

When a caller supplies StartTS from an earlier OCC read but leaves ReadKeys empty, this retry silently replaces the snapshot timestamp. Fresh evidence beyond the earlier ReadKeys case is S3 createBucket, which reads bucket metadata at readTS (adapter/s3.go:573) and dispatches a txn with that StartTS but no ReadKeys (adapter/s3.go:605-613); if a route shift triggers this path, a concurrent create that committed after the S3 read but before the retry is no longer rejected by the FSM's latest > startTS write-conflict check, so the retry can overwrite/accept a stale decision instead of surfacing a conflict. The retry should only advance StartTS for coordinator-created write-only snapshots, or otherwise force the adapter to re-read and rebuild the request.

Useful? React with 👍 / 👎.

…(codex P1 on 57da888, PR #900) Codex P1 on 57da888 identifies that the M4 retry path's StartTS bump is unsafe whenever the caller supplied StartTS — not only the already-gated ReadKeys-non-empty case. When the caller hands us StartTS=readTS (because it already read metadata at that snapshot and is now dispatching a write), bumping StartTS on retry lets any concurrent commit landed in (originalStartTS, newStartTS) bypass the FSM's `latest > startTS` write-conflict check. The intervening commit is now dominated by the new snapshot, so OCC silently accepts the retry's stale-snapshot decision. The pre-Dispatch read is INVISIBLE to the coordinator (it is not surfaced as a ReadKeys entry), so the only signal that the snapshot is caller-owned is reqs.StartTS being non-zero at Dispatch entry. Real exposure: a code-base grep for adapter sites that supply `StartTS = readTS` without ReadKeys turns up at least 13 hits across S3, SQS, and DynamoDB adapters: * adapter/s3.go:573 (createBucket) * adapter/s3_admin_objects.go:144, :348 * adapter/sqs_messages.go:541, :1113, :1332, :1487, :1614 * adapter/sqs_messages_batch.go:222 * adapter/sqs_fifo.go:241 * adapter/sqs_purge.go:146 * adapter/sqs_tags.go:179 * adapter/dynamodb.go:4366, :8274 All of these read metadata at readTS and dispatch a write with that StartTS but empty ReadKeys. Under the pre-fix retry path any Composed-1 sentinel they hit would have been silently absorbed by the StartTS-bumping retry, breaking OCC for concurrent writes in the bumped-by window. Fix: capture `callerSuppliedStartTS := reqs.IsTxn && reqs.StartTS != 0` at Dispatch entry (BEFORE the coordinator-allocates-on-zero branch that mutates the field), thread it into dispatchTxnWithComposed1Retry, and surface the sentinel without retry whenever the flag is set. Coordinator-allocated StartTS remains retryable — by definition no read has happened at that timestamp, so the bump is OCC-safe. Caller audit (loop directive's semantic-change check): the adapter sites listed above are the BENEFICIARIES of this fix. Pre-fix they were silently exposed to stale-snapshot OCC bypass under route shifts; post-fix they correctly surface ErrComposed1Violation, which propagates through their existing HTTP error path to the client. No caller depended on the silent retry succeeding — the M4 retry was added in this PR and no adapter is yet wired to take advantage of it in the caller-supplied-StartTS case. The DynamoDB adapter sites that already supply StartTS=0 (adapter/dynamodb.go:922, :981, etc.) continue to use the coordinator-allocated path, so the M4 retry remains available for those. Test changes: * NEW TestShardedCoordinator_SurfacesComposed1OnCallerSuppliedStartTS: write-only OperationGroup with StartTS=42, ReadKeys empty, firstErr=ErrComposed1Violation. Asserts the sentinel surfaces unchanged and exactly ONE Commit attempt fires. Verified failing on the parent commit (the retry absorbed the sentinel and the second attempt succeeded) and passing on this commit. * Existing M4-retry tests that previously set StartTS=10 (a test fixture, not a real "caller-supplied" intent) updated to omit StartTS so the coordinator allocates and the retry path is eligible to fire: - TestShardedCoordinator_RetriesOnComposed1Violation - TestShardedCoordinator_RetriesOnComposed1VersionGCd - TestShardedCoordinator_PersistentComposed1ViolationSurfaces - TestShardedCoordinator_PreservesPrevCommitTSOnComposed1Retry These tests use NewHLC() and the in-memory engine, so nextStartTS allocates cleanly. * Tests that exercise read-write surfacing (TestShardedCoordinator_DoesNotRetryReadWriteTxnOnComposed1, TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn) keep StartTS=10 — ReadKeys non-empty surfaces first, so the new callerSuppliedStartTS gate is never reached. Same for the pin tests that succeed on first attempt. Also fixes a doc-comment nit claude[bot] flagged on 57da888: the composed1RetryingTransactional stub's PrevCommitTS-decode comment still said "assert the field is cleared on the retry attempt" — that was the now-disproven claude[bot] medium reasoning from 43c55df. Updated to say "PRESERVED across retry attempts" with the codex P1 (43c55df) reference. Verification: go test -race -count=1 ./kv → pass (full pkg, 11s). make lint → 0 issues.

github-actions · 2026-06-01T19:55:19Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (144ec0c):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

bootjp · 2026-06-01T19:55:21Z

Pushed as 144ec0c — codex P1 on 57da888 addressed (plus claude[bot] stale-comment nit).

Codex P1 — caller-supplied StartTS surfaces. The M4 retry path's StartTS bump was unsafe for ALL caller-supplied StartTS values, not just the already-gated ReadKeys-non-empty case. A code-base grep found 13+ adapter sites that supply StartTS = readTS without populating ReadKeys (their pre-Dispatch metadata read is invisible to the coordinator):

adapter/s3.go:573 (createBucket), s3_admin_objects.go:144 / :348
adapter/sqs_messages.go:541 / :1113 / :1332 / :1487 / :1614
adapter/sqs_messages_batch.go:222, sqs_fifo.go:241, sqs_purge.go:146, sqs_tags.go:179
adapter/dynamodb.go:4366 / :8274

Under the pre-fix retry, any Composed-1 sentinel they hit would have been silently absorbed by the bumped-StartTS retry, letting any commit landed in (originalStartTS, newStartTS) bypass the FSM's latest > startTS OCC check.

Fix: capture callerSuppliedStartTS := reqs.IsTxn && reqs.StartTS != 0 at Dispatch entry (BEFORE the coordinator-allocates-on-zero branch mutates the field), thread it into dispatchTxnWithComposed1Retry, surface the sentinel without retry when set. Coordinator-allocated StartTS remains retryable — by definition no read has happened at that timestamp.

Caller audit (loop directive): the 13+ adapter sites are the BENEFICIARIES of this fix. Pre-fix they were silently exposed to stale-snapshot OCC bypass under route shifts; post-fix they correctly surface ErrComposed1Violation through their existing HTTP error path. No caller depended on the silent retry succeeding (M4 was added in this PR; no adapter is yet wired to take advantage). DynamoDB sites with StartTS: 0 (dynamodb.go:922, :981, etc.) continue to use the coordinator-allocated path — M4 retry remains available for those.

Regression test: TestShardedCoordinator_SurfacesComposed1OnCallerSuppliedStartTS — write-only OG with StartTS=42, ReadKeys empty, asserts the sentinel surfaces unchanged and exactly ONE Commit attempt fires. Failing on parent (retry absorbed); passing on fix.

Existing M4-retry tests updated to omit StartTS (was StartTS: 10 test fixture, now coordinator-allocated path so retry path is eligible). Tests that exercise read-write surfacing keep StartTS=10 — ReadKeys non-empty surfaces first, so the new gate is never reached.

claude[bot] nit on 57da888: stub's PrevCommitTS-decode comment still said "assert the field is cleared" (the now-disproven 43c55df-era claude[bot] medium reasoning). Updated to "PRESERVED across retry attempts" with codex P1 (43c55df) reference.

go test -race ./kv pass (full pkg, 11s) / make lint 0 issues.

@claude review

chatgpt-codex-connector · 2026-06-01T19:55:28Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-06-01T19:55:39Z

Claude finished @bootjp's task in 3m 6s —— View job

PR Review: Codex P1 fix — caller-supplied StartTS surfaces (`144ec0c`)

Gather context and read changed files
Verify Codex P1 fix (callerSuppliedStartTS gate)
Verify regression test correctness
Check all prior review findings are resolved
Check TLA+ spec divergence

✅ Codex P1 — correct

callerSuppliedStartTS := reqs.IsTxn && reqs.StartTS != 0 at line 464 is captured before the coordinator-allocates-on-zero branch at line 466. This is the right placement: the flag records what the caller supplied, before the coordinator potentially overwrites reqs.StartTS. The two branches are clean:

Caller supplied StartTS	`callerSuppliedStartTS`	M4 retry	Reason
`StartTS != 0` (e.g. `readTS` from a prior metadata read)	`true`	❌ surfaces immediately	Coordinator cannot prove no one read at `originalStartTS`; bumping would open OCC blind spot in `(originalStartTS, newStartTS)`
`StartTS == 0` (coordinator allocates)	`false`	✅ retries once	Coordinator owns the timestamp; no external read has happened at it

The new guard at lines 572–574 is placed correctly in the loop body — after isComposed1RetryableError (so it only fires on the two Composed-1 sentinels) and before the attempt == composed1RetryAttempts budget check (so budget never even decrements for caller-owned snapshots).

✅ Regression test — correct

TestShardedCoordinator_SurfacesComposed1OnCallerSuppliedStartTS (lines 478–510):

Dispatches with StartTS: 42, ReadKeys empty
Asserts errors.Is(err, ErrComposed1Violation) — sentinel surfaces unchanged
Asserts txn.calls.Load() == 1 — exactly one attempt fires

The test comment documents the 13+ real adapter sites and explains why the pre-Dispatch read is invisible to the coordinator (no ReadKeys). The test correctly distinguishes the callerSuppliedStartTS gate from the earlier ReadKeys != 0 gate, which is the right decomposition.

✅ Existing tests updated correctly

TestShardedCoordinator_RetriesOnComposed1Violation, TestShardedCoordinator_RetriesOnComposed1VersionGCd, and TestShardedCoordinator_PersistentComposed1ViolationSurfaces now omit StartTS (coordinator-allocated path), with inline comments pointing back to the Codex P1 finding. This keeps those tests on the retry-eligible path without conflating the two gates.

TestShardedCoordinator_PinsObservedRouteVersionAtDispatchEntry and TestShardedCoordinator_HonorsCallerPinnedObservedRouteVersion retain StartTS: 10 because they test the auto-pin logic on the first (successful) attempt — the callerSuppliedStartTS gate is never reached since there is no Composed-1 error.

✅ Stale-comment nit from `57da888` — fixed

The composed1RetryingTransactional.Commit stub comment (lines 31–42) now correctly says "PRESERVED across retry attempts" and cites the codex P1 on 43c55df that disproved the original claude[bot] "clear it" advice. No longer misleads future readers.

✅ All prior review findings resolved

Finding	Round	Status
gemini CRITICAL / codex P1: SSI violation on read-write retry	R1	✅ `66ad5b0` — `ReadKeys != 0` surfaces immediately
gemini HIGH: nil-clock leaves StartTS=0	R1	✅ `66ad5b0` — unconditional `nextStartTS`
claude medium: PrevCommitTS not cleared on retry	R1–R3	✅ `57da888` — correctly NOT cleared (codex P1 disproved the clearing)
Minor: empty goroutine in test	R1	✅ `43c55df` — removed; converted to block comment
Nit: stale `resp` on `allocErr`	R2	✅ `43c55df` — `return nil, allocErr`
codex P1: auto-pin stamps post-shift version for read-write txns	R3	✅ `10123c5` — `maybeAutoPinObservedRouteVersion` gated on `ReadKeys empty`
codex P1: callerSuppliedStartTS OCC bypass	R4	✅ `144ec0c` — this commit

⚠️ Design note (not blocking)

maybeAutoPinObservedRouteVersion gates on len(reqs.ReadKeys) != 0 but not on callerSuppliedStartTS. For adapter sites that supply StartTS = readTS with empty ReadKeys (the same 13+ sites codex identified), the auto-pin fires and stamps the dispatch-time catalog version. This means M3 covers the dispatch→apply window for these callers but NOT the read→dispatch window.

This is not a regression: pre-M4 these callers had ObservedRouteVersion = 0, so the M3 gate always short-circuited (zero coverage). Post-M4 the gate covers dispatch→apply (partial coverage). Full coverage requires these adapters to migrate to pin ObservedRouteVersion at BeginTxn time per §4.1. The staged-migration posture (observedVer=0 → gate short-circuits) is documented in the maybeAutoPinObservedRouteVersion comment.

If the read→dispatch window is deemed in-scope for M4, the fix would be to also skip the auto-pin when callerSuppliedStartTS is true — but that would reduce M3 gate coverage for these callers compared to the current state, which seems wrong. The right answer is adapter migration to §4.1, which is the pre-flagged M5+ work. Not blocking here.

TLA+ spec divergence — no update needed

144ec0c is confined to dispatchTxnWithComposed1Retry and the Dispatch method preamble in sharded_coordinator.go. No changes to hlc*.go, coordinator.go HLC lease-renewal paths, transaction.go, lock_resolver.go, fsm.go, mvcc_store.go, or distribution/. The retry orchestration wraps dispatchTxn calls; it does not touch timestamp issuance, OCC commit-ts assignment, FSM apply, or catalog versioning. No spec module needs updating.

LGTM. Five review rounds have been thorough. All correctness findings are closed. The callerSuppliedStartTS fix correctly extends the write-only-is-retryable contract from ReadKeys-based detection to StartTS-origin detection, closing the class of adapter sites (S3/SQS/DynamoDB) that supply an owned snapshot without populating ReadKeys.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 144ec0ca87

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T19:56:40Z

+	if c.engine == nil || reqs.ObservedRouteVersion != 0 || len(reqs.ReadKeys) != 0 {
+		return
+	}
+	reqs.ObservedRouteVersion = c.engine.Version()


Avoid auto-pinning caller-owned snapshots

When a transaction arrives with a caller-supplied StartTS but empty ReadKeys (for example adapter/s3.go:573-614 reads bucket metadata at readTS and dispatches the write with StartTS: startTS and no ObservedRouteVersion), this still stamps the catalog version at Dispatch time. If a route moved between that adapter-layer read and Dispatch, the request is pinned to the post-shift route version, so verifyComposed1 will not report the route shift even though the write decision was based on the old snapshot; the later callerSuppliedStartTS retry guard never helps because the sentinel was masked. The auto-pin needs to be limited to coordinator-owned snapshots as well as empty ReadKeys, or these adapters need to pin the version when they take the read snapshot.

Useful? React with 👍 / 👎.

…alse-confidence gate signal (codex P1 on 144ec0c, PR #900) Codex P1 on 144ec0c identifies a remaining hole in the M3 auto-pin coverage: when an adapter supplies StartTS=readTS with ReadKeys empty (the 13+ S3/SQS/DynamoDB sites enumerated in 144ec0c's commit message), the Dispatch-entry auto-pin stamps reqs.ObservedRouteVersion = engine.Version() — the CURRENT catalog version, not the version the read was actually captured at. If a route shift landed between the adapter-layer read and this Dispatch call, the auto-pin records the POST-shift version and verifyComposed1 finds no mismatch at apply time. The route shift becomes INVISIBLE to the gate — masking exactly the failure mode the gate exists to catch. This is strictly worse than ObservedRouteVersion=0: * observedVer=0 → M3 gate short-circuits → operator knows no protection is in place (correct signal). * spurious auto-pin → M3 gate runs and passes → operator gets false confidence (corrupt signal). The "partial coverage" framing from claude[bot]'s 144ec0c review ("dispatch→apply window is still covered for these callers") is illusory: the auto-pin's claim of catalog version on the request LIES about when the read happened. A false claim that the gate verifies as true is a hostile signal, not a partial one. The correct fix lives in the adapters (pin ObservedRouteVersion at read time, design doc §4.1), which is M5+ migration work. Until then, withholding the auto-pin is the only honest posture. Fix: extend maybeAutoPinObservedRouteVersion's skip conditions to include callerSuppliedStartTS, mirroring the M4-retry gate added in 144ec0c. The condition now reads: if engine == nil || ObservedRouteVersion != 0 || len(ReadKeys) != 0 || callerSuppliedStartTS { return // do NOT auto-pin } reqs.ObservedRouteVersion = engine.Version() The flag is already captured at Dispatch entry (before the allocate-on-zero branch) by 144ec0c; this commit just threads it one more level down. Caller audit (loop directive's semantic-change check): the 13+ adapter sites that supply StartTS=readTS without ReadKeys (adapter/s3.go:573, sqs_messages.go:541 etc., dynamodb.go:4366 etc.) currently get auto-pinned and the gate verifies with a potentially-false claim. Post-fix they get ObservedRouteVersion=0 and the gate short-circuits. No caller loses real protection — the auto-pin's "protection" was false-positive coverage that masked actual shifts. Adapters that want genuine M3 protection must pin ObservedRouteVersion at read time per §4.1, which is the pre-flagged M5+ migration. Coordinator-allocated StartTS callers (DynamoDB sites with StartTS=0 — adapter/dynamodb.go:922, :981, :1760, :4686, :4703, :4722, :4735) are unchanged: callerSuppliedStartTS=false, auto-pin fires as before. The coordinator's allocation IS the read-time-equivalent moment for those callers (no read against a different catalog version has happened earlier), so the auto-pin's claim is honest. Regression test: * TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForCallerSuppliedStartTS — caller-supplied StartTS=42, ReadKeys empty, ObservedRouteVersion=0, engine pinned at v=9. Asserts the request that reached the stub Commit carries ObservedRouteVersion=0 (auto-pin did NOT fire). Verified failing on the parent (the request carried v=9) and passing on this commit. Existing test TestShardedCoordinator_PinsObservedRouteVersionAtDispatchEntry updated to omit StartTS so the auto-pin path remains exercised on the coordinator-allocated branch. TestShardedCoordinator_HonorsCallerPinnedObservedRouteVersion is unchanged (its ObservedRouteVersion is non-zero so the auto-pin short-circuits on the existing condition before the new gate even applies). TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForReadWriteTxn is unchanged (ReadKeys non-empty short-circuits first). Verification: go test -race -count=1 ./kv → pass (full pkg, 10s). make lint → 0 issues.

github-actions · 2026-06-01T20:04:12Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (6202b96):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

bootjp · 2026-06-01T20:04:15Z

Pushed as 6202b96 — codex P1 on 144ec0c addressed.

Codex P1 — skip auto-pin for caller-supplied StartTS. The auto-pin at Dispatch entry stamped reqs.ObservedRouteVersion = engine.Version() even for callers that supplied StartTS=readTS with empty ReadKeys (the same 13+ S3/SQS/DynamoDB sites from 144ec0c). That stamps the CURRENT catalog version on a request whose read happened at an EARLIER catalog version — a route shift in (readTime, dispatchTime) becomes invisible to verifyComposed1, masking the failure mode the gate exists to catch.

Why this is worse than no auto-pin: ObservedRouteVersion=0 → gate short-circuits → operator knows protection is not in place. Spurious auto-pin → gate verifies a FALSE claim → false confidence. claude[bot]'s 144ec0c framing of "partial coverage" is illusory because the auto-pin LIES about when the read happened — it's a corrupt signal, not a partial one.

Fix: thread the already-captured callerSuppliedStartTS one level down into maybeAutoPinObservedRouteVersion. The skip condition now reads:

if engine == nil || ObservedRouteVersion != 0 || len(ReadKeys) != 0 || callerSuppliedStartTS { return }

Caller audit (loop directive): the 13+ caller-supplied-StartTS adapter sites lose the auto-pin's false-positive coverage but no real protection — they were never honestly covered. DynamoDB sites with StartTS: 0 (dynamodb.go:922 / :981 / :1760 / :4686 / :4703 / :4722 / :4735) are unaffected — coordinator-allocated path still auto-pins, which is honest because the allocation IS the read-time-equivalent moment. Adapters that want genuine M3 protection must pin at read time per §4.1 (M5+ migration).

Regression test: TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForCallerSuppliedStartTS — StartTS=42, ReadKeys empty, engine at v=9, asserts observed=0. Failing on parent (was v=9) / passing on fix.

Existing test PinsObservedRouteVersionAtDispatchEntry updated to omit StartTS so it stays on the coordinator-allocated branch (still auto-pins). HonorsCallerPinnedObservedRouteVersion unchanged (its non-zero ObservedRouteVersion short-circuits the auto-pin on the existing condition). DoesNotAutoPinForReadWriteTxn unchanged (ReadKeys non-empty short-circuits first).

go test -race ./kv pass (full pkg, 10s) / make lint 0 issues.

@claude review

chatgpt-codex-connector · 2026-06-01T20:04:19Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-06-01T20:04:32Z

Claude finished @bootjp's task in 2m 35s —— View job

PR Review: Codex P1 fix — skip auto-pin for caller-supplied StartTS (`6202b96`)

Gather context and understand the request
Read current state of changed files (6202b96)
Verify Codex P1 fix (callerSuppliedStartTS threading into maybeAutoPinObservedRouteVersion)
Verify regression test correctness
Check all prior review findings are still resolved
Check TLA+ spec divergence
Post findings

✅ Codex P1 — correct

The callerSuppliedStartTS flag is now threaded from Dispatch entry through dispatchTxnWithComposed1Retry into maybeAutoPinObservedRouteVersion. The updated skip condition at line 544:

if c.engine == nil || reqs.ObservedRouteVersion != 0 || len(reqs.ReadKeys) != 0 || callerSuppliedStartTS {
    return
}

The flag is correctly captured at line 464 before the coordinator-allocates-on-zero branch at line 466 can mutate reqs.StartTS. This ordering is critical — the flag records what the caller supplied, not what the coordinator later assigns. Both uses of the flag are internally consistent:

Caller situation	Auto-pin	Retry on Composed-1	Reason
Coordinator-allocated StartTS (caller passed 0)	✅ fires	✅ retries once	Coordinator owns the snapshot; no prior read exists
Caller-supplied StartTS, ReadKeys empty	❌ skipped	❌ surfaces immediately	Adapter read metadata at this snapshot before Dispatch (S3/SQS/DynamoDB pattern); coordinator cannot observe the read
ReadKeys non-empty (read-write txn)	❌ skipped	❌ surfaces immediately	Explicit read set at earlier StartTS; SSI violation on retry
Caller-pinned ObservedRouteVersion (non-zero)	❌ skipped (condition 2)	depends on StartTS	Caller already knows the version

The "false confidence is strictly worse than honest no-coverage" reasoning from the trigger comment is correct. ObservedRouteVersion=0 lets the M3 gate short-circuit cleanly (operator knows protection is not in place); a spuriously auto-stamped post-shift version makes the gate appear to verify a claim it cannot actually make.

✅ Regression test — correct

TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForCallerSuppliedStartTS (lines 513–577): StartTS: 42, ReadKeys empty, ObservedRouteVersion at 0, engine at v=9. Asserts observedSeen[0] == 0. This directly verifies the auto-pin was suppressed.

✅ `TestShardedCoordinator_PinsObservedRouteVersionAtDispatchEntry` update — correct

Removing StartTS: 10 from the test fixture keeps it on the coordinator-allocated path where the auto-pin IS expected to fire. With StartTS: 10 present, callerSuppliedStartTS would be true, suppressing the auto-pin and causing the assertion observedSeen[0] == 7 to fail post-fix (it would be 0). The comment in the test explains the rationale correctly.

✅ All prior review findings resolved

Finding	Round	Status
gemini CRITICAL / codex P1: SSI violation on read-write retry	R1	✅ `66ad5b0` — `ReadKeys != 0` surfaces immediately
gemini HIGH: nil-clock leaves StartTS=0	R1	✅ `66ad5b0` — unconditional `nextStartTS`
codex P1: auto-pin stamps post-shift version for read-write txns	R2	✅ `10123c5` — `len(ReadKeys) != 0` guard
Minor: empty goroutine in test	R2	✅ `43c55df` — removed; converted to block comment
Nit: stale `resp` on `allocErr`	R2	✅ `43c55df` — `return nil, allocErr`
claude medium: PrevCommitTS clearing (was wrongly added, then correctly reverted)	R2–R3	✅ `57da888` — NOT cleared; preserves adapter's at-most-once probe
codex P1: callerSuppliedStartTS OCC bypass on retry	R4	✅ `144ec0c` — surfaces without retry
codex P1: auto-pin for caller-supplied StartTS	R5	✅ `6202b96` — this commit

TLA+ spec divergence — no update needed

6202b96 is confined to maybeAutoPinObservedRouteVersion and dispatchTxnWithComposed1Retry in sharded_coordinator.go. No changes to hlc*.go, coordinator.go HLC lease-renewal paths, transaction.go, lock_resolver.go, fsm.go, mvcc_store.go, or distribution/. The auto-pin decision does not touch timestamp issuance, OCC commit-ts assignment, FSM apply, or catalog versioning. No spec module needs updating.

LGTM. Six review rounds across this PR have been thorough. All correctness findings are closed. The final shape correctly distinguishes four cases (coordinator-allocated / caller-supplied-StartTS-only / read-write / caller-pinned-route-version) and is honest about which ones M3/M4 can and cannot protect. The one remaining open item is the M5 Jepsen workload, pre-flagged in the PR description.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6202b9648a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T20:10:32Z

+	if c.engine == nil || reqs.ObservedRouteVersion != 0 || len(reqs.ReadKeys) != 0 || callerSuppliedStartTS {
+		return
+	}
+	reqs.ObservedRouteVersion = c.engine.Version()


Don't auto-pin 2PC writes until secondary failures are handled

When this auto-pins a coordinator-owned write-only transaction, ordinary multi-shard requests become subject to the Composed-1 gate even though the 2PC secondary path still treats secondary Commit errors as best-effort: dispatchMultiShardTxn calls commitSecondaryTxns and then returns success, while commitSecondaryTxns only logs errors from commitSecondaryWithRetry. If a route moves after PREPARE/primary COMMIT but before a secondary COMMIT, that secondary can now return ErrComposed1Violation; Dispatch still acknowledges success, leaving the secondary write uncommitted/invisible to readers on the new owner. Before this auto-pin these requests carried ObservedRouteVersion=0 and the gate short-circuited, so this change can turn route churn into an acknowledged partial commit unless secondary Composed-1 errors are made fatal/recoverable before success is returned.

Useful? React with 👍 / 👎.

…partial commits (codex P1 on 6202b96, PR #900) Codex P1 on 6202b96 identifies a structural conflict between M3's fail-closed semantics and 2PC's best-effort secondary semantic. When dispatchMultiShardTxn propagates a non-zero ObservedRouteVersion to secondary commit Requests, a route shift that lands AFTER primary-COMMIT but BEFORE a secondary-COMMIT makes the secondary FSM return ErrComposed1Violation. commitSecondaryTxns swallows the error (logs, then continues), and dispatchMultiShardTxn returns success to the caller — the secondary's write is uncommitted, invisible to readers on the new owner, and the txn is acknowledged as durable. This is a silent partial commit hazard. The auto-pin added in 66ad5b0 (and the explicit caller-pin path from PR #881) inadvertently extended the M3 gate to a step where its fail-closed semantic conflicts with the 2PC contract. The fundamental tension: * Prewrite: gate-active. A failed gate aborts before any locks are placed (groupReadKeys / prewriteTxn return the error). Clean recovery, no partial state. * Primary commit: gate-active. A failed gate triggers abortPreparedTxn (the existing line 849 path), which releases the prewrite intents on every prepared shard. Clean recovery, no partial state. * Secondary commit: gate-active is HAZARDOUS. The primary has already durably committed; there is no rollback path. commitSecondaryTxns is best-effort by design (its docstring says lock-resolution backstops unavailable secondaries), but lock resolution itself would fail the same gate on the same shifted route. Net effect: silent partial commit. Fix: drop ObservedRouteVersion to 0 on every per-secondary commit Request. fsm.go:609 short-circuits verifyComposed1 on observedVer==0, so the gate never fires on secondaries. This is NOT a regression of PR #881's "propagate the pin everywhere" intent — it is a refinement that recognizes secondary commit as the step where propagation cannot compose with the 2PC contract. PR #881's underlying concern (prewrite and primary commit silently dropping the version) remains addressed; only the secondary commit's propagation is reversed. Caller audit (loop directive's semantic-change check): the change is entirely internal to commitSecondaryTxns. The observedRouteVersion parameter is removed from the function signature; dispatchMultiShardTxn no longer passes it. No external caller is affected. Effect on production callers: * Auto-pinned write-only coordinator-allocated multi-shard txns (e.g. DynamoDB BatchWriteItem with StartTS=0): secondary commits no longer trigger Composed-1 on cross-COMMIT route shifts. Pre-fix exposure: silent partial commit under operator-triggered SplitRange. Post-fix: secondary commit succeeds (or surfaces a non-Composed-1 error from the underlying Raft layer). Lock-resolution backstop unaffected. * Caller-pinned write-only multi-shard txns: same as above — the pin still applies to prewrite and primary commit but not to secondary commits. Callers that want secondary-side M3 protection need a fatal-secondary 2PC variant (M5+). Regression test: * TestShardedCoordinator_DropsObservedRouteVersionOn2PCSecondaryCommit — 2-shard fixture (primary "b" → g1, secondary "x" → g2), coordinator-allocated StartTS so auto-pin stamps ObservedRouteVersion=7 on the OperationGroup. Asserts the coverage matrix: * prewrite on both shards: observedVer=7 * primary commit on g1: observedVer=7 * secondary commit on g2: observedVer=0 Verified failing on the parent (g2 commit carried v=7) and passing on this commit. Existing test TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion (PR #881 contract) updated to reflect the codex P1 refinement. It now asserts the same matrix above for the caller-pinned case (pin=42). The original assertion (every 2PC envelope carries the pin) was the regression codex P1 identifies; the updated assertion encodes the corrected contract. Verification: go test -race -count=1 ./kv → pass (full pkg, 8s). make lint → 0 issues.

github-actions · 2026-06-01T20:24:30Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (d848767):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

bootjp · 2026-06-01T20:24:35Z

Pushed as d848767 — codex P1 on 6202b96 addressed.

Note on flaky CI: test (ubuntu-latest) failed on 6202b96 in TestHotKeysAggregatorRaceFree (keyviz pkg, unrelated to this PR — I haven't touched keyviz). Reproduced locally 5/5 PASS. Likely a known flake. The other test job on 6202b96 passed.

Codex P1 — drop M3 on secondary commits. commitSecondaryTxns is best-effort by design (logs per-secondary errors and continues, then dispatchMultiShardTxn returns success while lock resolution backstops unavailable secondaries). Propagating ObservedRouteVersion to secondary commit Requests composes with M3's fail-closed semantic to yield a silent partial commit: a route shift between primary-COMMIT and secondary-COMMIT makes the secondary FSM return ErrComposed1Violation; the error is swallowed; the caller-visible Dispatch ACK still says success; the secondary's write is invisible to readers on the new owner.

Coverage matrix the post-fix code satisfies:

Step	observedVer	Why
Prewrite (both shards)	pinned	gate-active before lock placement; clean abort path
Primary commit	pinned	linearization point; abortPreparedTxn cleans up on failure
Secondary commit	0	best-effort step has no rollback; gate would silently mask

Fix: drop the observedRouteVersion parameter from commitSecondaryTxns entirely; per-secondary Requests are built with ObservedRouteVersion unset (zero). FSM short-circuits verifyComposed1 on observedVer==0 (fsm.go:609).

Caller audit (loop directive): change is entirely internal to commitSecondaryTxns. No external callers affected. Auto-pinned write-only multi-shard txns (DynamoDB BatchWriteItem etc.) lose the silent-partial-commit hazard; explicit-caller-pin callers also lose secondary-side coverage (the fundamental hazard is structural to 2PC + fail-closed gate; M5+ work for a fatal-secondary 2PC variant or lock-resolution-bypass).

Regression test: TestShardedCoordinator_DropsObservedRouteVersionOn2PCSecondaryCommit — 2-shard fixture, asserts the coverage matrix above. Failing on parent (g2 commit carried v=7) / passing on fix.

Existing test TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion (PR #881 contract) updated to encode the refined contract — prewrite + primary commit carry pin, secondary commit drops it. The original "propagate everywhere" assertion was exactly what codex P1 identifies as the bug.

go test -race ./kv pass (full pkg, 8s) / make lint 0 issues.

@claude review

chatgpt-codex-connector · 2026-06-01T20:24:38Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-06-01T20:24:51Z

Claude finished @bootjp's task in 3m 8s —— View job

PR Review: `d848767` — drop ObservedRouteVersion on 2PC secondary commits

Gather context and read changed files
Verify Codex P1 fix (drop ObservedRouteVersion from commitSecondaryTxns)
Verify regression test correctness
Check updated TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion
Check all prior review findings still resolved
Check TLA+ spec divergence
Post findings

✅ Codex P1 — correct

commitSecondaryTxns no longer accepts observedRouteVersion as a parameter. The secondary commit pb.Request at line 1014 is built without the field (zero):

req := &pb.Request{
    IsTxn:     true,
    Phase:     pb.Phase_COMMIT,
    Ts:        startTS,
    Mutations: append([]*pb.Mutation{meta}, keyMutations(grouped[gid])...),
    // ObservedRouteVersion intentionally omitted (zero) ...
}

The comment block (lines 1019–1038) is one of the most thorough explanations in this PR: it names the exact failure mode (route shift between primary-COMMIT and secondary-COMMIT → ErrComposed1Violation → swallowed → silent partial commit), explains why verifyComposed1 short-circuits on observedVer==0 (fsm.go:609), and names what M5+ would need to fix (fatal-secondary 2PC or lock-resolution-bypass). The reasoning is correct.

The call-site comment in dispatchMultiShardTxn (lines 854–859) correctly cross-references the inner comment, so both the orchestration layer and the implementation layer document the same invariant.

✅ Coverage matrix satisfied

The two-phase flow for a 2-shard txn after d848767:

Step	ObservedRouteVersion	Gate active?	Abort path
Prewrite (both shards, Phase_PREPARE)	pinned (v=7)	✅	`abortPreparedTxn`
Primary commit (Phase_COMMIT)	pinned (v=7)	✅	`abortPreparedTxn` cleans up
Secondary commit (Phase_COMMIT)	0	❌ gate short-circuits	lock-resolver backstop

This matches the PR description's coverage matrix exactly.

✅ Regression test — correct

TestShardedCoordinator_DropsObservedRouteVersionOn2PCSecondaryCommit (lines 611–661):

2-shard fixture: key "b" → g1 (primary), key "x" → g2 (secondary)
Coordinator-allocated StartTS → callerSuppliedStartTS=false, ReadKeys empty → auto-pin fires → ObservedRouteVersion=7
Each shard receives 2 requests (PREPARE + COMMIT), recorded in order by recordingTransactional
g1Txn.requests[0] (prewrite) asserts observedVer=7 ✓
g2Txn.requests[0] (prewrite) asserts observedVer=7 ✓
g1Txn.requests[1] (primary commit) asserts observedVer=7 ✓
g2Txn.requests[1] (secondary commit) asserts observedVer=0 ✓ — this is the regression

The request ordering is correct because prewriteTxn fires PREPARE on all shards first (both g1 and g2 get request[0]), then commitPrimaryTxn fires COMMIT on g1 only (g1.request[1]), then commitSecondaryTxns fires COMMIT on g2 only (g2.request[1]).

✅ Updated `TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion` — correct

The test comment (lines 579–606) accurately describes the evolution: PR #881 established "propagate everywhere", d848767 refines it to "propagate to prewrite + primary commit, drop on secondary commit". The updated assertion at line 661 (require.Equal(t, uint64(0), g2Txn.requests[1].ObservedRouteVersion)) encodes the refined contract and will catch any regression that re-introduces propagation to the secondary path.

✅ All prior review findings resolved

Finding	Round	Commit	Status
gemini CRITICAL / codex P1: SSI violation on read-write retry	R1	`66ad5b0`	✅
gemini HIGH: nil-clock leaves StartTS=0	R1	`66ad5b0`	✅
codex P1: auto-pin for read-write txns	R2	`10123c5`	✅
Minor: empty goroutine in test	R2	`43c55df`	✅
Nit: stale `resp` on `allocErr`	R2	`43c55df`	✅
claude medium: PrevCommitTS clearing (correctly NOT cleared)	R2–R3	`57da888`	✅
codex P1: callerSuppliedStartTS OCC bypass on retry	R4	`144ec0c`	✅
codex P1: auto-pin for caller-supplied StartTS	R5	`6202b96`	✅
codex P1: 2PC secondary ObservedRouteVersion silent partial commit	R6	`d848767`	✅ this commit

TLA+ spec divergence — no update needed

Only kv/sharded_coordinator.go, kv/sharded_coordinator_composed1_retry_test.go, and kv/sharded_coordinator_txn_test.go changed in this PR. No modifications to kv/hlc*.go, kv/coordinator.go (HLC lease renewal), kv/transaction.go, kv/lock_resolver.go, kv/fsm.go, store/mvcc_store.go, or distribution/. The 2PC secondary commit change is purely in the Request construction layer — it does not touch timestamp issuance, OCC commit-ts assignment, FSM apply, or catalog versioning. No spec module needs updating.

LGTM. Seven review rounds across this PR have systematically closed every correctness finding. The M1+M2+M3+M4 stack is now complete for write-only txns with coordinator-allocated timestamps — route shifts are transparent to the client for that class. The M5 follow-ups (Jepsen workload, fatal-secondary 2PC, adapter §4.1 migration) are correctly pre-flagged in the PR description.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8487672d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T20:30:35Z

 		req := &pb.Request{
-			IsTxn:                true,
-			Phase:                pb.Phase_COMMIT,
-			Ts:                   startTS,
-			Mutations:            append([]*pb.Mutation{meta}, keyMutations(grouped[gid])...),
-			ObservedRouteVersion: observedRouteVersion,
+			IsTxn:     true,
+			Phase:     pb.Phase_COMMIT,
+			Ts:        startTS,
+			Mutations: append([]*pb.Mutation{meta}, keyMutations(grouped[gid])...),


Preserve the secondary commit route pin

When a multi-shard transaction is caller-pinned or auto-pinned, this request literal now always sends ObservedRouteVersion=0 for secondary commits. If the secondary key's route moves after PREPARE/primary COMMIT but before this best-effort secondary COMMIT, the old secondary FSM short-circuits verifyComposed1 and applies the write on a group that is no longer the current owner; dispatchMultiShardTxn then acknowledges success. The M3 current-version fence was added specifically to reject these late commits to old owners, so secondary Composed-1 failures need to be made fatal/recoverable rather than disabling the gate for all secondaries.

Useful? React with 👍 / 👎.

…e propagation (codex P1 on d848767, PR #900) d848767 dropped ObservedRouteVersion from secondary commit Requests to avoid the silent-partial-commit hazard codex P1 on 6202b96 flagged: "fail-closed gate + best-effort swallow = uncommitted secondary, missing-write silent partial commit." Codex P1 on d848767 correctly identifies that dropping the gate trades one silent partial commit for another: with the gate off, the secondary FSM short-circuits verifyComposed1 and APPLIES the write on a group that is no longer the catalog's owner. Readers on the new owner never see it. Both codex reviews actually agreed on the right fix — make secondary Composed-1 errors FATAL — but d848767 picked the wrong interpretation ("disable the gate") instead of the explicit recommendation ("make fatal/recoverable"). This commit applies the codex-correct fix: 1. Restore ObservedRouteVersion propagation to secondary commit Requests (revert d848767's literal change). 2. Add ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel (kv/txn_errors.go). Distinct from ErrComposed1Violation by design: the M4 retry path in dispatchTxnWithComposed1Retry matches ErrComposed1Violation / ErrComposed1VersionGCd; reusing those here would loop the retry against a primary that is already durable, re-prewriting into existing locks at the old gid — pure wasted work since the route catalog will not move backward. 3. Change commitSecondaryTxns to return (uint64, error). On Composed-1 sentinel from any per-secondary commitSecondaryWithRetry, surface ErrTxnSecondaryRouteShiftedAfterPrimaryCommit. Non-Composed-1 errors stay best-effort (logged and continued) to preserve the original lock-resolution-backstop semantic. 4. dispatchMultiShardTxn propagates the sentinel to its caller. Trade-off accepted: the caller now sees a fatal error when a multi-shard 2PC race lands. The primary's write is durable; the secondary's is missing. Application-level recovery (re-read, re-execute, abort, or compensating write) is required. This is strictly better than either silent direction — at least the caller knows the txn is half-broken and can act on it. Caller audit (loop directive's semantic-change check): commitSecondaryTxns's only caller is dispatchMultiShardTxn, updated in this commit. The new error type ErrTxnSecondaryRouteShiftedAfterPrimaryCommit is NOT in the M4 retry's retryable set (isComposed1RetryableError matches ErrComposed1Violation / ErrComposed1VersionGCd only), so the M4 retry loop in dispatchTxnWithComposed1Retry surfaces it directly without re-attempting. Production callers (adapter/redis.go, adapter/dynamodb.go, etc.) see the new error type at the Dispatch boundary and propagate it as their HTTP/RPC error. No adapter explicitly matches Composed-1 sentinels today, so the semantic shift (success-on-partial-commit → error-on-partial-commit) is a pure correctness improvement — pre-fix exposure was silent data loss under cross-COMMIT route shifts; post-fix is visible error. Test changes: * NEW TestShardedCoordinator_SurfacesFatalErrorOn2PCSecondaryComposed1 — 2-shard fixture, primary "b" → g1 succeeds (prewrite + commit), secondary "x" → g2 prewrite succeeds, commit persistently fails with ErrComposed1Violation (every retry of commitSecondaryWithRetry). Asserts the Dispatch error is ErrTxnSecondaryRouteShiftedAfterPrimaryCommit and NOT ErrComposed1Violation (so M4 retry does not match). Verified failing on the parent (Dispatch returned nil error because commitSecondaryTxns swallowed) and passing on this commit. * RENAMED TestShardedCoordinator_DropsObservedRouteVersionOn2PCSecondaryCommit → TestShardedCoordinator_PropagatesObservedRouteVersionOn2PCSecondaryCommit. Inverted the assertion: secondary commit's ObservedRouteVersion MUST equal the auto-pinned engine version, not zero. The d848767 contract was the temporary "drop-the-gate" posture; this commit restores propagation. * REVERTED TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion to the original PR #881 contract: every 2PC envelope on every shard carries the pinned version. The d848767 refinement ("drop on secondary commit") is undone in favour of the new fatal-sentinel approach. Verification: go test -race -count=1 ./kv → pass (full pkg, 9s). make lint → 0 issues. Note on the d848767 CI failure: TestRedis_StreamXReadLatencyIsConstant in adapter/ failed at 48.35s on the loaded ubuntu-latest runner. That is a latency-sensitive perf test unrelated to this PR (no Redis stream code touched).

github-actions · 2026-06-01T20:41:12Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (6a458a2):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

chatgpt-codex-connector · 2026-06-01T20:41:19Z

To use Codex here, create a Codex account and connect to github.

bootjp · 2026-06-01T20:41:22Z

Pushed as 6a458a2 — codex P1 on d848767 addressed.

The d848767 fix was the wrong interpretation. Codex P1 on 6202b96 flagged "fail-closed gate + best-effort swallow = silent partial commit (missing-write direction)" and recommended "make secondary errors fatal/recoverable." I picked the wrong option ("disable the gate") instead of the explicit recommendation ("make fatal"). Codex P1 on d848767 correctly showed that the disabled-gate posture trades one silent partial commit for another (stale-owner direction): the secondary FSM short-circuits verifyComposed1 and applies the write on a group that's no longer the catalog's owner.

Correct fix — make secondary Composed-1 fatal with a distinct sentinel.

Restored ObservedRouteVersion propagation to secondary commits.
New ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel — distinct from ErrComposed1Violation so the M4 retry path doesn't loop (the primary is durable; re-prewriting would conflict).
commitSecondaryTxns now returns (uint64, error). Composed-1 from a per-secondary commit surfaces as the new sentinel; non-Composed-1 errors stay best-effort (logged+continued) preserving lock-resolution-backstop semantic.
dispatchMultiShardTxn propagates the sentinel to the caller.

Coverage matrix (now restored to PR #881's original):

Step	ObservedRouteVersion	Composed-1 outcome
Prewrite (both shards)	pinned	abort cleanly
Primary commit	pinned	abortPreparedTxn cleans up
Secondary commit	pinned	surface ErrTxnSecondaryRouteShiftedAfterPrimaryCommit (fatal, no retry)

Caller audit (loop directive): commitSecondaryTxns has only one caller (dispatchMultiShardTxn). The new sentinel is NOT in isComposed1RetryableError's set, so M4 retry doesn't match. Adapters propagate the error through their HTTP/RPC error path — semantic change is strict correctness improvement: pre-fix silent data loss → post-fix visible error.

Tests:

NEW TestShardedCoordinator_SurfacesFatalErrorOn2PCSecondaryComposed1 — 2-shard fixture, secondary persistently fails Composed-1 on commit; asserts Dispatch returns the new sentinel and NOT ErrComposed1Violation. Failing on parent / passing on fix.
INVERTED TestShardedCoordinator_PropagatesObservedRouteVersionOn2PCSecondaryCommit (was Drops… in d848767) — secondary commit's observedVer=7 (restored, not 0).
REVERTED TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion to original PR kv(composed1): M1 plumbing — ObservedRouteVersion in OperationGroup → pb.Request #881 contract — every 2PC envelope on every shard carries pinned version.

go test -race ./kv pass (full pkg, 9s) / make lint 0 issues.

Note on d848767 CI failure: TestRedis_StreamXReadLatencyIsConstant (adapter/redis, 48.35s) — latency-sensitive perf test on loaded ubuntu-latest runner, unrelated to this PR.

@claude review

claude · 2026-06-01T20:41:34Z

Claude finished @bootjp's task in 3m 12s —— View job

PR Review: `6a458a2` — surface secondary Composed-1 as fatal sentinel, restore propagation

Gather context and understand the request
Read current state of changed files (6a458a2)
Verify the new ErrTxnSecondaryRouteShiftedAfterPrimaryCommit sentinel approach
Verify commitSecondaryTxns return type change and propagation
Verify ObservedRouteVersion propagation restored correctly
Verify new and updated tests
Check TLA+ spec divergence
Post findings

The revert of d848767 is correct, and the approach here (make secondary Composed-1 fatal with a distinct sentinel rather than disabling the gate) is the right resolution of the two competing codex P1 findings.

✅ Sentinel design — correct

ErrTxnSecondaryRouteShiftedAfterPrimaryCommit is the right answer to the tension between the two codex P1 findings:

Option	What happens	Problem
Swallow error (pre-66ad5b0)	Best-effort secondary logs + success	Missing-write silent partial commit (codex P1 on `6202b96`)
Drop gate on secondary (`d848767`)	Writes land on stale owner	Stale-owner silent partial commit (codex P1 on `d848767`)
Fatal sentinel (`6a458a2`)	Caller-visible error, lock-resolver backstop	Honest: txn state is uncertain, caller can recover

The new sentinel is correctly NOT matched by isComposed1RetryableError (line 669 only tests ErrComposed1Violation | ErrComposed1VersionGCd), so dispatchTxnWithComposed1Retry surfaces it without looping. The M4 loop termination path is clean:

dispatchTxnWithComposed1Retry attempt 0 → dispatchTxn → dispatchMultiShardTxn
Primary commits (durable).
commitSecondaryTxns → commitSecondaryWithRetry returns Composed-1.
isComposed1RetryableError → true → returns ErrTxnSecondaryRouteShiftedAfterPrimaryCommit.
dispatchMultiShardTxn propagates → dispatchTxn returns the sentinel.
isComposed1RetryableError(ErrTxnSecondaryRouteShiftedAfterPrimaryCommit) → false → surfaces, no retry.

✅ `commitSecondaryTxns` signature change — correct

The (uint64, error) return type is right. The early-return on Composed-1 (return maxIndex, errors.WithStack(ErrTxnSecondaryRouteShiftedAfterPrimaryCommit) at line 1062) correctly abandons remaining secondaries in the group list — once one secondary has a route shift that prevents commit, the txn state is already half-committed and application-level recovery is needed regardless of the others.

✅ `ObservedRouteVersion` propagation restored — correct

commitSecondaryTxns receives observedRouteVersion uint64 and passes it through to pb.Request.ObservedRouteVersion at line 1046. dispatchMultiShardTxn correctly passes the parameter (line 867). The coverage matrix is cleanly restored to PR #881's original posture:

Step	ObservedRouteVersion	Composed-1 outcome
Prewrite (both shards)	pinned	abort cleanly via `abortPreparedTxn`
Primary commit	pinned	`abortPreparedTxn` on failure
Secondary commit	pinned	`ErrTxnSecondaryRouteShiftedAfterPrimaryCommit` (fatal)

✅ Regression test — correct

TestShardedCoordinator_SurfacesFatalErrorOn2PCSecondaryComposed1 (lines 715–661):

g2Errs = [nil, ErrComposed1Violation × txnSecondaryCommitRetryAttempts] — correctly accounts for the inner commitSecondaryWithRetry retry loop (3 attempts before returning lastErr = ErrComposed1Violation).
require.ErrorIs(t, err, ErrTxnSecondaryRouteShiftedAfterPrimaryCommit) — asserts the new sentinel ✅
require.NotErrorIs(t, err, ErrComposed1Violation) — asserts M4 retry does not have a false surface ✅

TestShardedCoordinator_PropagatesObservedRouteVersionOn2PCSecondaryCommit correctly reverts the d848767 assertion back to observedVer=7 for all four envelopes.

TestShardedCoordinatorDispatchTxn_CrossShardPropagatesObservedRouteVersion correctly loops over all requests and asserts pinnedVer everywhere, eliminating the special-cased observedVer=0 assertion from d848767.

✅ All prior review findings resolved

Finding	Round	Status
gemini CRITICAL / codex P1: SSI on read-write retry	R1	✅ `66ad5b0`
gemini HIGH: nil-clock StartTS=0	R1	✅ `66ad5b0`
codex P1: auto-pin for read-write txns	R2	✅ `10123c5`
Minor: empty goroutine	R2	✅ `43c55df`
Nit: stale `resp` on `allocErr`	R2	✅ `43c55df`
claude medium: PrevCommitTS (correctly preserved)	R2–R3	✅ `57da888`
codex P1: callerSuppliedStartTS OCC bypass on retry	R4	✅ `144ec0c`
codex P1: auto-pin for caller-supplied StartTS	R5	✅ `6202b96`
codex P1: 2PC secondary gate — silent partial commit (both directions)	R6–R7	✅ `6a458a2` (this commit)

⚠️ Minor — `commitSecondaryWithRetry` sleeps between Composed-1 retries

commitSecondaryWithRetry sleeps 20ms + 40ms = 60ms before returning the final error on 3 Composed-1 attempts. These sleeps are useful for transient leader-election drops but wasteful for Composed-1 (the catalog won't move back during the retry window). The caller already correctly classifies the final error as fatal, so the semantics are correct — the only cost is latency on the error path. Not blocking; flagging for M5+ or a separate refactor to add a shouldRetry(err) bool predicate in commitSecondaryWithRetry.

TLA+ spec divergence — no update needed

6a458a2 is confined to kv/sharded_coordinator.go, kv/sharded_coordinator_composed1_retry_test.go, kv/sharded_coordinator_txn_test.go, and kv/txn_errors.go. No changes to kv/hlc*.go, kv/coordinator.go HLC renewal paths, kv/transaction.go, kv/lock_resolver.go, kv/fsm.go, store/mvcc_store.go, or distribution/. The sentinel classification and secondary-commit error propagation don't touch timestamp issuance, OCC commit-ts assignment, FSM apply, or catalog versioning. No spec module needs updating.

LGTM. Eight review rounds have been thorough. The final shape correctly handles all four classes of Dispatch caller, correctly gates the M3 sentinel at every 2PC step, and surfaces secondary Composed-1 failures as a distinct honest error rather than either swallowing or disabling the gate. The one remaining open item is the M5 Jepsen workload, pre-flagged in the PR description.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a458a2840

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T20:45:07Z

+	if c.engine == nil || reqs.ObservedRouteVersion != 0 || len(reqs.ReadKeys) != 0 || callerSuppliedStartTS {
+		return
+	}
+	reqs.ObservedRouteVersion = c.engine.Version()


Avoid auto-pinning resolver-routed transactions

When a partition resolver routes a write-only transaction to a group that differs from the byte-range engine (e.g. SQS partition keys), this auto-pin makes the FSM enforce the engine snapshot rather than the resolver's ownership: ShardRouter.ResolveGroup sends the raw key to the resolver first, but verifyComposed1 later calls the route-history OwnerOf on the raw mutation key, whose snapshots come only from distribution.Engine. With production FSMs wired via WithRouteHistory, a coordinator-owned txn with no StartTS/ReadKeys that used to commit unpinned can now be rejected with ErrComposed1Violation on both M4 attempts even though the resolver selected the correct group. The auto-pin needs to be skipped or made resolver-aware for resolver-claimed keys.

Useful? React with 👍 / 👎.

…a458a2, PR #900) Codex P1 on 6a458a2 identifies a final auto-pin skip case the prior rounds missed: when a PartitionResolver is wired (SQS HT-FIFO partition keys, etc.), ShardRouter routes via the resolver but verifyComposed1 calls route-history OwnerOf on the raw mutation key against the BYTE-RANGE engine snapshot. The engine has no knowledge of the resolver's mapping, so the gate spuriously rejects resolver-routed commits even when the resolver picked the correct gid. Concrete failure mode (codex): 1. ShardRouter.ResolveGroup → resolver returns gid R. 2. Dispatch-entry auto-pin stamps ObservedRouteVersion = engine.Version() on the request. 3. FSM at gid R applies; verifyComposed1 calls OwnerOf on the raw mutation key against the engine's snapshot. 4. Engine's snapshot says gid B (default byte-range owner), not gid R. 5. Mismatch → ErrComposed1Violation. 6. M4 retry re-reads engine.Version() (unchanged), retries, fails again. Spurious double failure. With production FSMs wired via WithRouteHistory, a coordinator-owned txn with no StartTS/ReadKeys that used to commit unpinned now gets rejected. The auto-pin's "blanket protection" framing breaks for resolver-routed keys — this is the FIFTH distinct scenario where the auto-pin overreaches: R1 (66ad5b0): ReadKeys non-empty (pre-shift reads) R2 (10123c5): same — gated R3 (144ec0c): callerSuppliedStartTS (adapter-layer pre-reads) R4 (6202b96): same — gated R5 (6a458a2): 2PC secondary commits — surfaced fatally R6 (this): resolver-claimed keys — gated Fix: extend maybeAutoPinObservedRouteVersion's skip conditions to include "any element's key is recognised by the partition resolver". Extracted anyResolverClaimedKey helper that: - returns false fast when no resolver is wired (the byte-range-only deployment — most common case); - iterates Elems and calls resolver.RecognisesPartitionedKey on each non-empty key; - returns true on first claim. Adapter for SQS HT-FIFO (the main resolver consumer) now correctly bypasses the M3 gate. Resolver-aware M3 is M5+ work. Caller audit (loop directive's semantic-change check): the change is internal to maybeAutoPinObservedRouteVersion. The auto-pin already had skip conditions for engine==nil, ObservedRouteVersion!=0, ReadKeys non-empty, and callerSuppliedStartTS; adding the resolver-claim skip is a straightforward extension. No external caller cares (the only caller is dispatchTxnWithComposed1Retry which threads the same flag through). Production callers using the partition resolver (SQS HT-FIFO via adapter/sqs_messages.go, adapter/sqs_messages_batch.go, etc.) see resolver-routed txns commit instead of being spuriously rejected — strict correctness improvement. Coordinator-routed (byte-range) txns continue to auto-pin as before; they were never affected by this hazard. Regression test: * TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForResolverRoutedTxn — installs a stubPartitionResolver claiming prefix "part:". Dispatches a write-only txn with key "part:q1:1234" and no StartTS/ReadKeys/ObservedRouteVersion. Engine pinned at version 9. Asserts the request that reaches Commit carries ObservedRouteVersion=0 (auto-pin did NOT fire). Verified failing on parent (request carried v=9) and passing here. Stub stubPartitionResolver is a minimal in-test PartitionResolver: it claims every key whose prefix matches a configurable byte sequence and routes them to a fixed gid. Sufficient for exercising the codex P1 skip path without pulling in the SQS-specific partition routing wiring. Verification: go test -race -count=1 ./kv → pass (full pkg, 10s). make lint → 0 issues.

bootjp · 2026-06-01T20:50:07Z

Pushed as e08d44f — codex P1 on 6a458a2 addressed.

Codex P1 — skip auto-pin for resolver-claimed keys. When a PartitionResolver is wired (SQS HT-FIFO partition keys etc.), ShardRouter routes via the resolver but verifyComposed1 calls route-history OwnerOf against the BYTE-RANGE engine snapshot. The engine has no knowledge of the resolver's mapping, so the gate spuriously rejects resolver-routed commits even when the resolver picked the correct gid. M4 retry re-reads engine.Version() (unchanged) and fails the same way — spurious double failure.

Fix: extend maybeAutoPinObservedRouteVersion's skip conditions with a fifth case: "any element's key is recognised by the partition resolver." Extracted anyResolverClaimedKey helper that returns false fast when no resolver is wired (the byte-range-only deployment), iterates Elems otherwise, and calls resolver.RecognisesPartitionedKey per non-empty key.

Skip-condition summary:

#	Skip when	Reason
1	engine == nil	no catalog to pin against
2	ObservedRouteVersion != 0	caller already pinned
3	len(ReadKeys) != 0	caller has pre-shift reads
4	callerSuppliedStartTS	caller-owned snapshot via readTS
5	resolver-claimed key	byte-range engine doesn't reflect resolver routing

Caller audit (loop directive): change is internal to maybeAutoPinObservedRouteVersion. Only caller is dispatchTxnWithComposed1Retry (unchanged). Production resolver consumers (adapter/sqs_messages.go, adapter/sqs_messages_batch.go, etc.) now correctly commit instead of being spuriously rejected — strict correctness improvement. Byte-range-only callers unaffected.

Regression test: TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForResolverRoutedTxn — installs stubPartitionResolver claiming prefix "part:", dispatches a write-only txn with key "part:q1:1234", asserts observedVer=0 (auto-pin must NOT fire). Failing on parent / passing on fix.

go test -race ./kv pass (full pkg, 10s) / make lint 0 issues.

@claude review

github-actions · 2026-06-01T20:50:08Z

TLA+ spec divergence review (auto-triggered)

This PR touches files that the TLA+ safety spec has an anchor on (per
docs/design/2026_05_28_partial_tla_safety_spec.md §3),
so an AI review is requested below to verify the implementation has not drifted
from the model.

Anchored files changed in this PR head (e08d44f):

kv/sharded_coordinator.go

What to check, by subsystem:

kv/hlc*.go — Next() must respect the HLC-4 preconditions (i)/(ii)/(iii) from the design doc: bounded skew, logical-counter handoff on leader change (strategy (c) Observe(MaxAppliedHLC)), and the commit-time ceiling fence (fail-closed when wall_now >= physicalCeiling). Any change to the bit layout (48/16), the CAS loop, or the ceiling getter/setter is in scope.
kv/coordinator.go, kv/sharded_coordinator.go — RunHLCLeaseRenewal, hlcRenewalInterval, hlcPhysicalWindowMs constants, and the new-term detection that calls Observe(fsm.MaxAppliedHLC()) (strategy (c)). Any change to renewal cadence, group selection, or fail-closed behaviour is in scope.
kv/transaction.go, kv/lock_resolver.go — OCC commit-ts assignment, lock-map encoding (key, lock_ts) -> start_ts, and the LockResolver action OCC-3 depends on. (M2 spec will land OCC-1..OCC-5; until then the spec doc §5.2 is the contract.)
kv/fsm.go — FSM apply of HLC lease entries (SetPhysicalCeiling), and any future MaxAppliedHLC() accessor that strategy (c) needs.
store/mvcc_store.go — version visibility, snapshot install, and the MVCC-1..MVCC-4 invariants (M3 scope).
distribution/** — route catalog versioning, SplitRange atomicity, and CatalogWatcher async fan-out (M4 scope).

If the change is correct but requires a spec update, edit tla/hlc/HLC.tla (or the corresponding M2..M5 module once landed) and the design doc in the same PR. The tla-check workflow runs the TLC model check on the same paths.

@claude review please verify TLA+ spec divergence per the checklist above.

@codex review please verify TLA+ spec divergence per the checklist above.

chatgpt-codex-connector · 2026-06-01T20:50:15Z

To use Codex here, create a Codex account and connect to github.

claude · 2026-06-01T20:50:22Z

Claude finished @bootjp's task in 3m 22s —— View job

PR Review: `e08d44f` — skip auto-pin for resolver-claimed keys (codex P1 on `6a458a2`)

Gather context and read changed files
Verify codex P1 fix (anyResolverClaimedKey helper + skip condition 5)
Verify regression test correctness
Check all prior review findings still resolved
Check TLA+ spec divergence

✅ Codex P1 — correct

anyResolverClaimedKey (lines 574–587) is clean:

c.router == nil || c.router.partitionResolver == nil fast-paths the common byte-range-only deployment with zero overhead — router is always set in NewShardedCoordinator, but the defensive nil guard is harmless.
Iterates elems, skips nil/empty-key entries, returns true on the first resolver-claimed key. The early-return semantics are right: if ANY key in the batch is resolver-owned, the whole auto-pin should be skipped (a mixed batch would fail anyway, since the resolver-claimed key's FSM would call verifyComposed1 against the byte-range engine snapshot which does not know the resolver's mapping).

maybeAutoPinObservedRouteVersion (line 559) now reads:

if c.engine == nil || reqs.ObservedRouteVersion != 0 || len(reqs.ReadKeys) != 0 || callerSuppliedStartTS {
    return
}
if c.anyResolverClaimedKey(reqs.Elems) {
    return
}
reqs.ObservedRouteVersion = c.engine.Version()

The five skip conditions are now complete and coherent:

#	Skip when	Reason
1	`engine == nil`	no catalog to pin against
2	`ObservedRouteVersion != 0`	caller already pinned
3	`len(ReadKeys) != 0`	caller has pre-shift reads (Redis MULTI/EXEC)
4	`callerSuppliedStartTS`	adapter-layer read invisible to coordinator (S3/SQS/DynamoDB pattern)
5	resolver-claimed key	byte-range engine snapshot doesn't reflect resolver routing; gate would spuriously reject every resolver-routed txn

Retry path is correct without a resolver check. The retry fires only when isComposed1RetryableError is true, which requires the M3 gate to have returned one of the two sentinels. But with observedRouteVersion=0 (what skip-5 leaves on the request), the FSM's verifyComposed1 always returns nil early — the M3 gate cannot return a Composed-1 sentinel for zero-versioned requests. So the retry loop's missing resolver-key guard is not a gap: the code path cannot be reached for resolver-claimed keys.

✅ Regression test — correct

TestShardedCoordinator_DoesNotAutoPinObservedRouteVersionForResolverRoutedTxn (lines 781–814):

Engine at v=9, resolver claiming prefix "part:", dispatches "part:q1:1234" with StartTS=0, ReadKeys=nil, ObservedRouteVersion=0 — none of the existing four skip gates would fire.
composed1RetryingTransactional with firstErr: nil → first attempt succeeds, exactly one Commit call.
Assertion: observedSeen[0] == 0 — auto-pin must NOT fire. If skip-5 were missing, observedSeen[0] would be 9 and the test would fail.

The stubPartitionResolver prefix-match loop (lines 734–743) is correct — it matches any key whose first len(claimedPrefix) bytes equal "part:". No use of bytes.HasPrefix (from Go 1.20+) for compat, which is fine.

✅ All prior review findings resolved

Finding	Round	Commit	Status
gemini CRITICAL / codex P1: SSI on read-write retry	R1	`66ad5b0`	✅
gemini HIGH: nil-clock leaves StartTS=0	R1	`66ad5b0`	✅
codex P1: auto-pin for read-write txns	R2	`10123c5`	✅
Minor: empty goroutine in test	R2	`43c55df`	✅
Nit: stale `resp` on `allocErr`	R2	`43c55df`	✅
claude medium: PrevCommitTS clearing (correctly preserved)	R2–R3	`57da888`	✅
codex P1: callerSuppliedStartTS OCC bypass on retry	R4	`144ec0c`	✅
codex P1: auto-pin for caller-supplied StartTS	R5	`6202b96`	✅
codex P1: 2PC secondary Composed-1 → silent partial commit (both directions)	R6–R7	`6a458a2`	✅
codex P1: resolver-claimed key auto-pin	R8	`e08d44f`	✅ this commit

TLA+ spec divergence — no update needed

e08d44f is confined to anyResolverClaimedKey and maybeAutoPinObservedRouteVersion in sharded_coordinator.go. No changes to kv/hlc*.go, kv/coordinator.go HLC lease-renewal paths, kv/transaction.go, kv/lock_resolver.go, kv/fsm.go, store/mvcc_store.go, or distribution/. The resolver check is purely in the Dispatch-entry orchestration — it does not touch timestamp issuance, OCC commit-ts assignment, FSM apply, or catalog versioning. No spec module needs updating.

LGTM. Nine review rounds across this PR have been thorough. All correctness findings are closed. The final skip-condition table in maybeAutoPinObservedRouteVersion's comment is now complete and accurate. The one remaining open item is the M5 Jepsen workload (route-shuffle generator for the DynamoDB suite), pre-flagged in the PR description.

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

		if c.engine != nil && reqs.ObservedRouteVersion == 0 {
		reqs.ObservedRouteVersion = c.engine.Version()

Conversation

bootjp commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

ShardedCoordinator.Dispatch — pin ObservedRouteVersion at entry

New dispatchTxnWithComposed1Retry — retry-once on Composed-1 sentinels

Caller audit (loop directive's semantic-change check)

Tests (5 new)

Verification

Self-review (5 lenses)

Test plan

Resolves

Summary by CodeRabbit

Uh oh!

bootjp commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented Jun 1, 2026

TLA+ spec divergence review (auto-triggered)

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: kv(composed1): M4 — Dispatch retry on ErrComposed1Violation / ErrComposed1VersionGCd

kv/sharded_coordinator.go

kv/sharded_coordinator_composed1_retry_test.go

Follow-up concern (not blocking M4)

Summary

Uh oh!

chatgpt-codex-connector Bot commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

bootjp commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

TLA+ spec divergence review (auto-triggered)

Uh oh!

chatgpt-codex-connector Bot commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Follow-up fixes (gemini CRITICAL/HIGH + codex P1)

✅ gemini CRITICAL / codex P1 — correct

✅ gemini HIGH — correct

⚠️ PrevCommitTS still not cleared on retry — prior medium finding still open

⚠️ Empty goroutine — prior minor finding still open

Nit — stale resp returned on allocErr

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

bootjp commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

`ShardedCoordinator.Dispatch` — pin ObservedRouteVersion at entry

New `dispatchTxnWithComposed1Retry` — retry-once on Composed-1 sentinels

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

`kv/sharded_coordinator.go`

`kv/sharded_coordinator_composed1_retry_test.go`

claude Bot commented Jun 1, 2026 •

edited

Loading

⚠️ `PrevCommitTS` still not cleared on retry — prior medium finding still open

Nit — stale `resp` returned on `allocErr`

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

PR Review: Codex P1 fix — `maybeAutoPinObservedRouteVersion` refactor (`10123c5`)

⚠️ `PrevCommitTS` still not cleared on retry — medium, now in its third review round

Nit — stale `resp` on `allocErr` (line 570)

claude Bot commented Jun 1, 2026 •

edited

Loading

PR Review: `43c55df` — PrevCommitTS + empty goroutine + stale resp fixes