feat(encryption): Stage 6E-1a — FSM apply machinery for enable-raft-envelope cutover#899
feat(encryption): Stage 6E-1a — FSM apply machinery for enable-raft-envelope cutover#899bootjp wants to merge 1 commit into
Conversation
…nvelope cutover Adds the FSM-side machinery for the Stage 6E Phase-2 raft-envelope cutover (per the design merged in #893). The admin RPC + server + CLI ship in a separate PR (6E-1b) so this slice does not pull in a proto regen. Wire layer (internal/encryption/fsmwire/wire.go) - New constant RotateSubEnableRaftEnvelope byte = 0x05 (the reserved value already documented in the comment block; this slice promotes it from "reserved" to live). - Added to the readRotationSubTag whitelist so the wire decoder accepts the sub-tag (previously rejected as ErrFSMWireSubtag). - The 0x02 / 0x03 reserved slots stay reserved for Stage 9. Applier (internal/encryption/applier.go) - validateEnableRaftEnvelopePayload — mirrors the storage variant but targets PurposeRaft + empty-Wrapped + proposer-DEK pinning. - applyEnableRaftEnvelope — structural mirror of applyEnableStorageEnvelope with the raft-specific differences: * compares DEKID against sc.Active.Raft (not Storage) * records the cutover via sc.RaftEnvelopeCutoverIndex (a uint64 index — non-zero means "Phase-2 active"; doubles as the idempotency token, no separate bool flag) * registers proposer against the active raft DEK (PurposeRaft) - ApplyRotation dispatch picks up the new sub-tag; the "unrecognised sub-tag" fallback continues to halt-apply for Stage 9 sub-tags. Outcomes (mirror the storage variant) - Malformed payload → halt-apply via ErrEncryptionApply. - Stale DEKID (RotateDEK raced between propose and apply) → benign no-op; advance RaftAppliedIndex only. - Already active (duplicate cutover entry) → idempotent; preserve the original RaftEnvelopeCutoverIndex, advance RaftAppliedIndex only. - Fresh success → register proposer FIRST, then set RaftEnvelopeCutoverIndex + advance RaftAppliedIndex inside one WriteSidecar fsync. The registration-before-sidecar ordering is the same crash-recovery invariant the storage variant uses (§4.1 case 2-idempotent re-runs are safe; the sidecar flip is the last observable side-effect). Operator-inertness in 6E-1a Stage 6E-1a deliberately does NOT activate the §6.3 engine apply-hook unwrap or the coordinator wrap-on-propose switch — those land atomically in 6E-2 (the design notes 6E-2's three pieces cannot ship separately because either ordering produces cluster-wide halt-apply at cutover). With only 6E-1a deployed, the sidecar cutover field advances on apply but no entry is wrapped or unwrapped, so this slice is operator-inert (matches the design's "no behavior change" guarantee for 6E-1). Tests (internal/encryption/applier_test.go) — 6 new tests: - TestApplyRotation_EnableRaftEnvelope_FreshSuccess (2 sub-cases: nil_wrapped + empty_wrapped — pins the length-based empty check against both Go construction paths) - TestApplyRotation_EnableRaftEnvelope_RejectsNonRaftPurpose - TestApplyRotation_EnableRaftEnvelope_RejectsNonEmptyWrapped - TestApplyRotation_EnableRaftEnvelope_RejectsProposerDEKMismatch - TestApplyRotation_EnableRaftEnvelope_StaleDEKID_BenignNoOp (uses RotateDEK to actually advance Active.Raft + the keys[] map, so the sidecar validator accepts the new state — mirrors the storage stale-DEKID test setup) - TestApplyRotation_EnableRaftEnvelope_AlreadyActive_IdempotentNoOp All 6 pass under -race. Full `internal/encryption/...` -race clean (32s). Lint clean. Self-review (5-lens): 1. Data loss — fresh-success is registration-before-sidecar so a crash between the two re-runs both safely on FSM replay. Stale + idempotent branches preserve the original cutover index (the §6.4-equivalent atomicity invariant for raft). 2. Concurrency — FSM apply is serial per Raft group; no race surface introduced. The atomic Mirror in StateCache (RaftEnvelopeCutoverIndex) is NOT extended in this slice because the engine apply-hook (6E-2) is the only consumer and reads sidecar directly. 3. Performance — one extra sub-tag case in the dispatch; hot-path cost unchanged for non-raft-cutover applies. 4. Data consistency — strict-greater-than dispatch invariant (`entry.Index > RaftEnvelopeCutoverIndex`) is a 6E-2 concern; 6E-1a only writes the field. No FSM determinism hazard because the apply path is deterministic across replicas (each replica's local sidecar reaches the same value at the same apply index). 5. Test coverage — every branch of the validate function and the apply state machine has a dedicated test; the 6D storage-cutover regression tests stay green.
📝 WalkthroughWalkthroughThis PR implements raft-layer envelope cutover (Stage 6E-1) by adding a new rotation subtag, validation and apply handler, and comprehensive test coverage for fresh apply, validation failures, stale DEK race conditions, and idempotent replay scenarios. ChangesRaft Envelope Cutover Implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements Stage 6E-1 of the encryption rotation design, introducing the one-shot raft-layer cutover (RotateSubEnableRaftEnvelope). It adds payload validation and application logic in internal/encryption/applier.go, updates the wire protocol in internal/encryption/fsmwire/wire.go, and includes comprehensive unit tests in internal/encryption/applier_test.go. Feedback was provided to explicitly validate that the raftIdx is non-zero in applyEnableRaftEnvelope to prevent silent activation failures.
| func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error { | ||
| if err := validateEnableRaftEnvelopePayload(p); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
In applyEnableRaftEnvelope, sc.RaftEnvelopeCutoverIndex is set to raftIdx and a non-zero value is used as the sole indicator that the raft envelope cutover is active (Phase-2 active). If raftIdx is 0 (which can happen in tests or if the index-aware apply seam is misconfigured), the cutover will silently fail to activate because sc.RaftEnvelopeCutoverIndex will remain 0, yet the function will return success. We should explicitly validate that raftIdx is non-zero at the start of the function and return an error if it is 0.
func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error {
if raftIdx == 0 {
return errors.Wrap(ErrEncryptionApply, "applier: enable-raft-envelope requires a non-zero raft index")
}
if err := validateEnableRaftEnvelopePayload(p); err != nil {
return err
}There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/encryption/applier.go`:
- Around line 1127-1133: Reject zero raftIdx up-front in
applyEnableRaftEnvelope: before reading or mutating sidecar state, check if the
incoming raftIdx == 0 and return an error so we never persist a registration
with a sentinel index; specifically, in Applier.applyEnableRaftEnvelope (and the
analogous path around lines 1169-1170) validate raftIdx != 0 and return an error
describing "zero raft index" before calling ReadSidecar or ApplyRegistration,
ensuring RaftEnvelopeCutoverIndex is only set with a non-zero index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e163f3ef-611e-4e84-b8f1-87d634b3e4f0
📒 Files selected for processing (3)
internal/encryption/applier.gointernal/encryption/applier_test.gointernal/encryption/fsmwire/wire.go
| func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error { | ||
| if err := validateEnableRaftEnvelopePayload(p); err != nil { | ||
| return err | ||
| } | ||
| sc, err := ReadSidecar(a.sidecarPath) | ||
| if err != nil { | ||
| return errors.Wrap(err, "applier: read sidecar for enable-raft-envelope") |
There was a problem hiding this comment.
Reject zero raftIdx before marking raft cutover.
RaftEnvelopeCutoverIndex is the only activation bit here, but the fresh-success path stores raftIdx verbatim. If this path is called with raftIdx == 0—the same "index not supplied" sentinel used elsewhere—the proposer registration is persisted while Phase 2 stays logically inactive and the entry is no longer idempotent on replay. Fail closed before ApplyRegistration when the index is zero.
Suggested fix
func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error {
if err := validateEnableRaftEnvelopePayload(p); err != nil {
return err
}
+ if raftIdx == 0 {
+ return errors.Wrap(ErrEncryptionApply,
+ "applier: enable-raft-envelope requires a non-zero raft index")
+ }
sc, err := ReadSidecar(a.sidecarPath)
if err != nil {
return errors.Wrap(err, "applier: read sidecar for enable-raft-envelope")
}Also applies to: 1169-1170
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/encryption/applier.go` around lines 1127 - 1133, Reject zero raftIdx
up-front in applyEnableRaftEnvelope: before reading or mutating sidecar state,
check if the incoming raftIdx == 0 and return an error so we never persist a
registration with a sentinel index; specifically, in
Applier.applyEnableRaftEnvelope (and the analogous path around lines 1169-1170)
validate raftIdx != 0 and return an error describing "zero raft index" before
calling ReadSidecar or ApplyRegistration, ensuring RaftEnvelopeCutoverIndex is
only set with a non-zero index.
Summary
Stage 6E-1a — FSM apply machinery for the §7.1 Phase-2 raft-envelope cutover. Builds on the design merged in #893.
Scope: FSM-side only. The admin RPC + server + CLI ship in 6E-1b (separate PR, deferred to keep this slice off the protoc toolchain version pin in CLAUDE.md).
What lands
RotateSubEnableRaftEnvelope byte = 0x05— the previously-reserved sub-tag is promoted to live; wire decoder whitelist updated.validateEnableRaftEnvelopePayload— mirrors the storage variant for the raft slot (Purpose=PurposeRaft, empty-Wrapped length check, proposer-DEK pinning).applyEnableRaftEnvelope— structural mirror ofapplyEnableStorageEnvelope. Records the cutover viasc.RaftEnvelopeCutoverIndex(a uint64 index — non-zero ⇒ "Phase-2 active"; doubles as the idempotency token, no separate bool flag). Registration-before-sidecar ordering matches the storage variant's crash-recovery invariant.ApplyRotationdispatch case for the new sub-tag.Outcomes (mirror the storage variant)
ErrEncryptionApplyRaftAppliedIndexonlyRaftEnvelopeCutoverIndexOperator-inertness in 6E-1a
Stage 6E-1a deliberately does NOT activate the §6.3 engine apply-hook unwrap or the coordinator wrap-on-propose switch — those land atomically in 6E-2 (the design notes 6E-2's three pieces cannot ship separately because either ordering produces cluster-wide halt-apply at cutover). With only 6E-1a deployed, the sidecar cutover field advances on apply but no entry is wrapped or unwrapped, matching the design's "no behavior change" guarantee for 6E-1.
Verification
TestApplyRotation_EnableRaftEnvelope_*. All pass under-race.internal/encryption/...-raceclean (32s).golangci-lint --config=.golangci.yaml run ./internal/encryption/...0 issues.Self-review (5-lens) — full breakdown in commit body
>is a 6E-2 concern; 6E-1a only writes the field. FSM determinism unaffected (each replica reaches the same value at the same apply index).Test plan
go test -race ./internal/encryption/...cleangolangci-lint --config=.golangci.yaml run ./internal/encryption/...cleanSummary by CodeRabbit
New Features
Tests