Skip to content

feat(encryption): Stage 6E-1a — FSM apply machinery for enable-raft-envelope cutover#899

Open
bootjp wants to merge 1 commit into
mainfrom
feat/encryption-6e-1a-fsm-cutover
Open

feat(encryption): Stage 6E-1a — FSM apply machinery for enable-raft-envelope cutover#899
bootjp wants to merge 1 commit into
mainfrom
feat/encryption-6e-1a-fsm-cutover

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 1, 2026

Summary

Stage 6E-1a — FSM apply machinery for the §7.1 Phase-2 raft-envelope cutover. Builds on the design merged in #893.

Scope: FSM-side only. The admin RPC + server + CLI ship in 6E-1b (separate PR, deferred to keep this slice off the protoc toolchain version pin in CLAUDE.md).

What lands

  • RotateSubEnableRaftEnvelope byte = 0x05 — the previously-reserved sub-tag is promoted to live; wire decoder whitelist updated.
  • validateEnableRaftEnvelopePayload — mirrors the storage variant for the raft slot (Purpose=PurposeRaft, empty-Wrapped length check, proposer-DEK pinning).
  • applyEnableRaftEnvelope — structural mirror of applyEnableStorageEnvelope. Records the cutover via sc.RaftEnvelopeCutoverIndex (a uint64 index — non-zero ⇒ "Phase-2 active"; doubles as the idempotency token, no separate bool flag). Registration-before-sidecar ordering matches the storage variant's crash-recovery invariant.
  • ApplyRotation dispatch case for the new sub-tag.

Outcomes (mirror the storage variant)

Case Treatment
Malformed (purpose / Wrapped / proposer-DEK) Halt-apply via ErrEncryptionApply
Stale DEKID (RotateDEK race) Benign no-op; advance RaftAppliedIndex only
Already active (duplicate cutover) Idempotent; preserve original RaftEnvelopeCutoverIndex
Fresh success Register proposer → set cutover + advance applied inside one fsync

Operator-inertness in 6E-1a

Stage 6E-1a deliberately does NOT activate the §6.3 engine apply-hook unwrap or the coordinator wrap-on-propose switch — those land atomically in 6E-2 (the design notes 6E-2's three pieces cannot ship separately because either ordering produces cluster-wide halt-apply at cutover). With only 6E-1a deployed, the sidecar cutover field advances on apply but no entry is wrapped or unwrapped, matching the design's "no behavior change" guarantee for 6E-1.

Verification

  • 6 new tests under TestApplyRotation_EnableRaftEnvelope_*. All pass under -race.
  • Full internal/encryption/... -race clean (32s).
  • golangci-lint --config=.golangci.yaml run ./internal/encryption/... 0 issues.

Self-review (5-lens) — full breakdown in commit body

  1. Data loss — registration-before-sidecar ordering preserves crash-recovery invariant.
  2. Concurrency — FSM apply is serial per Raft group; no race surface added.
  3. Performance — one extra sub-tag dispatch case; hot-path unchanged.
  4. Data consistency — strict-> is a 6E-2 concern; 6E-1a only writes the field. FSM determinism unaffected (each replica reaches the same value at the same apply index).
  5. Test coverage — every validate branch + every apply state-machine branch has a dedicated test; 6D regression tests stay green.

Test plan

  • go test -race ./internal/encryption/... clean
  • golangci-lint --config=.golangci.yaml run ./internal/encryption/... clean
  • 6 new tests + all pre-6E-1a tests green
  • @claude review for the apply state machine, the registration-before-sidecar ordering rationale, and the "operator-inertness in 6E-1a" guarantee

Summary by CodeRabbit

  • New Features

    • Added support for raft-layer encryption envelope cutover with validation and proposer registration.
  • Tests

    • Added comprehensive test coverage for raft cutover functionality including success cases, validation failures, and idempotency scenarios.

…nvelope cutover

Adds the FSM-side machinery for the Stage 6E Phase-2 raft-envelope
cutover (per the design merged in #893). The admin RPC + server +
CLI ship in a separate PR (6E-1b) so this slice does not pull in
a proto regen.

Wire layer (internal/encryption/fsmwire/wire.go)
  - New constant RotateSubEnableRaftEnvelope byte = 0x05 (the
    reserved value already documented in the comment block; this
    slice promotes it from "reserved" to live).
  - Added to the readRotationSubTag whitelist so the wire decoder
    accepts the sub-tag (previously rejected as ErrFSMWireSubtag).
  - The 0x02 / 0x03 reserved slots stay reserved for Stage 9.

Applier (internal/encryption/applier.go)
  - validateEnableRaftEnvelopePayload — mirrors the storage variant
    but targets PurposeRaft + empty-Wrapped + proposer-DEK pinning.
  - applyEnableRaftEnvelope — structural mirror of
    applyEnableStorageEnvelope with the raft-specific differences:
      * compares DEKID against sc.Active.Raft (not Storage)
      * records the cutover via sc.RaftEnvelopeCutoverIndex (a
        uint64 index — non-zero means "Phase-2 active"; doubles
        as the idempotency token, no separate bool flag)
      * registers proposer against the active raft DEK (PurposeRaft)
  - ApplyRotation dispatch picks up the new sub-tag; the
    "unrecognised sub-tag" fallback continues to halt-apply for
    Stage 9 sub-tags.

Outcomes (mirror the storage variant)
  - Malformed payload → halt-apply via ErrEncryptionApply.
  - Stale DEKID (RotateDEK raced between propose and apply) →
    benign no-op; advance RaftAppliedIndex only.
  - Already active (duplicate cutover entry) → idempotent; preserve
    the original RaftEnvelopeCutoverIndex, advance RaftAppliedIndex
    only.
  - Fresh success → register proposer FIRST, then set
    RaftEnvelopeCutoverIndex + advance RaftAppliedIndex inside one
    WriteSidecar fsync. The registration-before-sidecar ordering
    is the same crash-recovery invariant the storage variant uses
    (§4.1 case 2-idempotent re-runs are safe; the sidecar flip is
    the last observable side-effect).

Operator-inertness in 6E-1a
  Stage 6E-1a deliberately does NOT activate the §6.3 engine
  apply-hook unwrap or the coordinator wrap-on-propose switch —
  those land atomically in 6E-2 (the design notes 6E-2's three
  pieces cannot ship separately because either ordering produces
  cluster-wide halt-apply at cutover). With only 6E-1a deployed,
  the sidecar cutover field advances on apply but no entry is
  wrapped or unwrapped, so this slice is operator-inert (matches
  the design's "no behavior change" guarantee for 6E-1).

Tests (internal/encryption/applier_test.go) — 6 new tests:
  - TestApplyRotation_EnableRaftEnvelope_FreshSuccess (2 sub-cases:
    nil_wrapped + empty_wrapped — pins the length-based empty
    check against both Go construction paths)
  - TestApplyRotation_EnableRaftEnvelope_RejectsNonRaftPurpose
  - TestApplyRotation_EnableRaftEnvelope_RejectsNonEmptyWrapped
  - TestApplyRotation_EnableRaftEnvelope_RejectsProposerDEKMismatch
  - TestApplyRotation_EnableRaftEnvelope_StaleDEKID_BenignNoOp
    (uses RotateDEK to actually advance Active.Raft + the keys[]
    map, so the sidecar validator accepts the new state — mirrors
    the storage stale-DEKID test setup)
  - TestApplyRotation_EnableRaftEnvelope_AlreadyActive_IdempotentNoOp

All 6 pass under -race. Full `internal/encryption/...` -race
clean (32s). Lint clean.

Self-review (5-lens):
  1. Data loss — fresh-success is registration-before-sidecar so
     a crash between the two re-runs both safely on FSM replay.
     Stale + idempotent branches preserve the original cutover
     index (the §6.4-equivalent atomicity invariant for raft).
  2. Concurrency — FSM apply is serial per Raft group; no
     race surface introduced. The atomic Mirror in StateCache
     (RaftEnvelopeCutoverIndex) is NOT extended in this slice
     because the engine apply-hook (6E-2) is the only consumer
     and reads sidecar directly.
  3. Performance — one extra sub-tag case in the dispatch;
     hot-path cost unchanged for non-raft-cutover applies.
  4. Data consistency — strict-greater-than dispatch invariant
     (`entry.Index > RaftEnvelopeCutoverIndex`) is a 6E-2
     concern; 6E-1a only writes the field. No FSM determinism
     hazard because the apply path is deterministic across
     replicas (each replica's local sidecar reaches the same
     value at the same apply index).
  5. Test coverage — every branch of the validate function and
     the apply state machine has a dedicated test; the 6D
     storage-cutover regression tests stay green.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements raft-layer envelope cutover (Stage 6E-1) by adding a new rotation subtag, validation and apply handler, and comprehensive test coverage for fresh apply, validation failures, stale DEK race conditions, and idempotent replay scenarios.

Changes

Raft Envelope Cutover Implementation

Layer / File(s) Summary
Wire protocol for raft envelope cutover
internal/encryption/fsmwire/wire.go
Adds RotateSubEnableRaftEnvelope (0x05) rotation subtag constant with documentation describing wire layout and validation semantics; updates readRotationSubTag decoder to accept this new subtag.
Raft envelope cutover apply path
internal/encryption/applier.go
Updates ApplyRotation documentation and dispatch switch to route RotateSubEnableRaftEnvelope to a new handler. Implements validateEnableRaftEnvelopePayload to enforce raft purpose, empty wrapped field, and proposer-registration DEK equality. Implements applyEnableRaftEnvelope to handle stale-DEK benign no-op (consume entry without setting cutover index), already-active idempotent replay (preserve original cutover index while advancing applied index), and fresh-success path (apply proposer registration, set cutover index, advance applied index, persist sidecar, refresh state cache).
Raft envelope cutover test suite
internal/encryption/applier_test.go
Adds six tests covering fresh-success with nil and empty-byte wrapped payloads, validation failures (non-raft purpose, non-empty wrapped, DEK mismatch with assertions that state remains unchanged), stale-DEKID benign no-op, and already-active idempotent replay. Includes runRaftCutoverFreshSuccess helper for orchestrating cutover setup and assertions, and assertRaftCutoverNotApplied helper for validating rejection cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • bootjp/elastickv#768: Introduces Stage 6B KEK-backed ApplyRotation dispatcher that this PR directly extends with the new raft envelope cutover case.
  • bootjp/elastickv#748: Establishes Stage-4 encryption wire rotation-subtag decoder and format that this PR extends with the new 0x05 raft envelope cutover subtag.
  • bootjp/elastickv#765: Scaffolds the ApplyRotation implementation that this PR enhances with raft envelope cutover routing and apply semantics.

🐰 A hop and a skip through the raft,
Stage 6E-1 cuts the envelope in half,
Fresh or stale, no duplicates snag,
The proposer gets pinned in the bag! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: implementing FSM apply machinery for enable-raft-envelope cutover as part of Stage 6E-1a. It directly corresponds to the changeset's primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/encryption-6e-1a-fsm-cutover

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Stage 6E-1 of the encryption rotation design, introducing the one-shot raft-layer cutover (RotateSubEnableRaftEnvelope). It adds payload validation and application logic in internal/encryption/applier.go, updates the wire protocol in internal/encryption/fsmwire/wire.go, and includes comprehensive unit tests in internal/encryption/applier_test.go. Feedback was provided to explicitly validate that the raftIdx is non-zero in applyEnableRaftEnvelope to prevent silent activation failures.

Comment on lines +1127 to +1130
func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error {
if err := validateEnableRaftEnvelopePayload(p); err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In applyEnableRaftEnvelope, sc.RaftEnvelopeCutoverIndex is set to raftIdx and a non-zero value is used as the sole indicator that the raft envelope cutover is active (Phase-2 active). If raftIdx is 0 (which can happen in tests or if the index-aware apply seam is misconfigured), the cutover will silently fail to activate because sc.RaftEnvelopeCutoverIndex will remain 0, yet the function will return success. We should explicitly validate that raftIdx is non-zero at the start of the function and return an error if it is 0.

func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error {
	if raftIdx == 0 {
		return errors.Wrap(ErrEncryptionApply, "applier: enable-raft-envelope requires a non-zero raft index")
	}
	if err := validateEnableRaftEnvelopePayload(p); err != nil {
		return err
	}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/encryption/applier.go`:
- Around line 1127-1133: Reject zero raftIdx up-front in
applyEnableRaftEnvelope: before reading or mutating sidecar state, check if the
incoming raftIdx == 0 and return an error so we never persist a registration
with a sentinel index; specifically, in Applier.applyEnableRaftEnvelope (and the
analogous path around lines 1169-1170) validate raftIdx != 0 and return an error
describing "zero raft index" before calling ReadSidecar or ApplyRegistration,
ensuring RaftEnvelopeCutoverIndex is only set with a non-zero index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e163f3ef-611e-4e84-b8f1-87d634b3e4f0

📥 Commits

Reviewing files that changed from the base of the PR and between ed575bf and ff0eb90.

📒 Files selected for processing (3)
  • internal/encryption/applier.go
  • internal/encryption/applier_test.go
  • internal/encryption/fsmwire/wire.go

Comment on lines +1127 to +1133
func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error {
if err := validateEnableRaftEnvelopePayload(p); err != nil {
return err
}
sc, err := ReadSidecar(a.sidecarPath)
if err != nil {
return errors.Wrap(err, "applier: read sidecar for enable-raft-envelope")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject zero raftIdx before marking raft cutover.

RaftEnvelopeCutoverIndex is the only activation bit here, but the fresh-success path stores raftIdx verbatim. If this path is called with raftIdx == 0—the same "index not supplied" sentinel used elsewhere—the proposer registration is persisted while Phase 2 stays logically inactive and the entry is no longer idempotent on replay. Fail closed before ApplyRegistration when the index is zero.

Suggested fix
 func (a *Applier) applyEnableRaftEnvelope(raftIdx uint64, p fsmwire.RotationPayload) error {
 	if err := validateEnableRaftEnvelopePayload(p); err != nil {
 		return err
 	}
+	if raftIdx == 0 {
+		return errors.Wrap(ErrEncryptionApply,
+			"applier: enable-raft-envelope requires a non-zero raft index")
+	}
 	sc, err := ReadSidecar(a.sidecarPath)
 	if err != nil {
 		return errors.Wrap(err, "applier: read sidecar for enable-raft-envelope")
 	}

Also applies to: 1169-1170

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/encryption/applier.go` around lines 1127 - 1133, Reject zero raftIdx
up-front in applyEnableRaftEnvelope: before reading or mutating sidecar state,
check if the incoming raftIdx == 0 and return an error so we never persist a
registration with a sentinel index; specifically, in
Applier.applyEnableRaftEnvelope (and the analogous path around lines 1169-1170)
validate raftIdx != 0 and return an error describing "zero raft index" before
calling ReadSidecar or ApplyRegistration, ensuring RaftEnvelopeCutoverIndex is
only set with a non-zero index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant