Skip to content

backup: Phase 0b M6 implementation - cmd/elastickv-snapshot-encode + library#904

Merged
bootjp merged 37 commits into
mainfrom
backup/m6-cli-design
Jun 3, 2026
Merged

backup: Phase 0b M6 implementation - cmd/elastickv-snapshot-encode + library#904
bootjp merged 37 commits into
mainfrom
backup/m6-cli-design

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 1, 2026

Summary

Phase 0b M6 implementation per the merged design doc docs/design/2026_06_01_proposed_snapshot_encode_cli.md (#896, merged at fe9e941).

Wires the merged M1–M5 encoder slices into a user-facing CLI plus a library entrypoint mirroring the decoder's DecodeSnapshot. Implements the round-trip self-test the parent doc mandates, with write-then-rename atomic publish so a self-test failure never reaches the restore path.

What lands

Library (internal/backup/):

  • encode_snapshot.go: EncodeSnapshot(opts, out io.Writer) (EncodeResult, error) — high-level wrapper that dispatches per-adapter encoders in canonical fan-out order (redis → dynamodb → s3 → sqs), implements two-mode buffering (stream when SelfTest=false, buffer when SelfTest=true), runs the structural self-test against the in-memory buffer, and copies to out only on match. Unexported corruptBufferForTest hook lets same-package tests inject buffer corruption that reaches the self-test decode but never out (codex P2 v6 docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896 — write-then-rename atomicity).
  • encode_info.go: EncodeInfo schema + WriteEncodeInfo / ReadEncodeInfo helpers + EncodeInfoSidecarPath (path-derived <output>.encode_info.json, no static-name collisions per gemini medium v2 docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896). Format-version gate so a future bump surfaces as ErrUnsupportedEncodeInfoFormatVersion.
  • manifest.go: Exclusions.RenameS3Collisions bool added with JSON tag rename_s3_collisions. Intentionally NOT in exclusionsRequiredFields so legacy manifests decode safely with the zero value false (claude v5 medium docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896).

Decoder CLI update (cmd/elastickv-snapshot-decode/main.go):

  • emitManifest now populates the new RenameS3Collisions field from cfg.renameCollisions, completing the decoder→encoder round-trip for --rename-collisions dumps.

Encoder CLI (cmd/elastickv-snapshot-encode/main.go):

  • Flags: --input, --output, --adapter (decoder-parity CSV parser), --last-commit-ts, --self-test, --scratch-root.
  • Atomic publish: write to <output>.tmp-<random>, fsync+close, then rename. Self-test failure → mismatch.txt next to where .fsm would have been, temp file removed, exit 2.
  • Fail-closed HLC ceiling: --last-commit-ts T < manifest → exit 2 with typed ErrSelfTestLowerLastCommitTS.
  • Exit codes: 0 success / 1 user-input error / 2 data-correctness failure (decoder-parity).
  • encoder_version stamped at build time via -ldflags "-X main.version=..." (mirrors decoder pattern at main.go:45).

Test plan

go test -race -count=1 — all green:

  • internal/backup/: 3 new tests (encode_info schema, format gate, legacy manifest forward-compat)
  • internal/backup/encode_snapshot_test.go: 5 tests (library round-trip, self-test match against canonicalized input, corruption never reaches out, missing-input guard, sidecar path derivation)
  • cmd/elastickv-snapshot-encode/main_test.go: 9 tests (missing manifest, unknown adapter, lower-TS fail-closed, equal+higher TS accept, path-derived sidecar, two-files-no-collision, full round-trip with --self-test, atomic-publish never leaves bad .fsm, --last-commit-ts parser)

make lint: clean.

Caller audit per CLAUDE.md semantic-change rule

  • Exclusions struct gained a field. Existing callers either build via field-tagged literals (decoder CLI's emitManifest — updated to populate the new field) or read it (encoder's buildSelfTestDecodeOptions — new code). No silent semantic change.
  • DecodeOptions.RenameS3Collisions was already a public field used by the decoder; the encoder now also reads it via the manifest. No caller-side change needed.

Self-review (5 passes)

  1. Data loss: write-then-rename atomic publish — self-test failure never publishes the .fsm. Corrupted buffer never reaches the io.Writer (pinned by TestEncodeSnapshotSelfTestDetectsCorruption asserting out.Len()==0). All b.Add errors propagate through EncodeSnapshot.
  2. Concurrency: pure offline. CLI is single-shot; library takes a caller-owned io.Writer. Temp-file suffix uses crypto/rand so concurrent encodes against the same --output cannot collide.
  3. Performance: SelfTest=false streams with one sha256.Writer tee, no extra allocations. SelfTest=true allocates one FSM-sized *bytes.Buffer plus the scratch decode tree (documented memory cost).
  4. Data consistency: --last-commit-ts T < manifest fail-closed with typed error; self-test threads MANIFEST DecodeOptions (Exclusions.* + DynamoDBLayout → DynamoDBBundleJSONL) so trees produced with non-default decoder flags round-trip cleanly.
  5. Test coverage: 17 new tests cover library entrypoint, CLI flag parsing, atomic publish discipline, sidecar path-derivation, corruption detection, forward-compat for legacy manifests, and the four user-visible behaviors (round-trip / override / scope / missing-manifest).

Risk

Low. The encoder is offline; restoration is non-destructive (new keyspace on a fresh cluster via stop-replace-restart). The only public-API change is the Exclusions.RenameS3Collisions field, which is forward-compat for older manifests. All existing M1–M5 tests continue to pass.

Summary by CodeRabbit

  • New Features

    • New snapshot encode CLI with atomic publish, multi-adapter encoding, optional self-test verification, and per-output encode sidecar (.encode_info.json).
    • Sidecar safe-open exported and sidecar permission tightening to owner-only mode.
  • Bug Fixes

    • Manifests now include/preserve an S3 rename-collision flag while remaining compatible with older manifests.
    • Self-test failures avoid publishing bad outputs, preserve prior artifacts appropriately, and reject unexpected non-directory S3 entries.
  • Tests

    • Extensive unit and end-to-end tests covering CLI, library, self-test, manifest compatibility, sidecar handling, and filesystem safety.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant