Skip to content

spec-6.3a: GRD/GES entry lifecycle safe reclaim#13

Merged
sqlrush merged 7 commits into
mainfrom
spec-6.3a-grd-ges-lifecycle
Jul 1, 2026
Merged

spec-6.3a: GRD/GES entry lifecycle safe reclaim#13
sqlrush merged 7 commits into
mainfrom
spec-6.3a-grd-ges-lifecycle

Conversation

@sqlrush

@sqlrush sqlrush commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Scope

Implements spec-6.3a GRD/GES resource-entry lifecycle management and safe cold reclaim.

  • Adds lookup pins to ClusterGrdEntry and requires every successful lookup to release exactly once.
  • Reclaims only cold holderless entries after pin == 0, with holder/waiter/convert/reservation and RECLAIMING rechecks.
  • Removes the draft global GRD table lock from create/reclaim; scan safety now uses shard-local intrusive entry lists and one shard LWLock at a time.
  • Audits GRD lookup call sites and converts raw HTAB walks to key snapshots plus pin/release relookup.
  • Wraps pinned starvation-barrier LMD submit/cancel paths in PG_TRY cleanup so ERROR cannot leak a GRD entry pin.
  • Adds reclaim GUCs and pg_cluster_state counters: entries reclaimed, pinned skips, pin high-water, sweep runs.
  • Wires LMON periodic cold reclaim sweep and honors the configured sweep batch size without the old 256 hard cap.

Tests

Local macOS validation for head a92aa81877:

  • git diff --check
  • ./scripts/ci/check-comment-headers.sh
  • make -j8
  • make -C src/test/cluster_unit check (135 binaries passed)
  • src/test/cluster_unit/test_cluster_grd (80 tests passed)
  • src/test/cluster_unit/test_cluster_grd_starvation (23 tests passed)
  • make -C src/test/cluster_tap check PROVE_TESTS='t/104_grd_entry_shmem_alloc_smoke.pl t/296_grd_entry_lifecycle_reclaim_2node.pl' (22 TAP tests passed)

Additional coverage added:

  • Unit: paired pin/release, cold last-unpin reclaim, sweep reclaim, live-state exclusion, large sweep batch, over-release fail-safe, pinned LMD ERROR cleanup.
  • TAP: two-node cross-node advisory-lock churn with cluster.grd_max_entries = 32, 48 distinct resources, bounded empty GRD views, reclaim counter increase, and no entry FULL counter growth.

GitHub status:

  • Fast PR checks passed for head a92aa81877: run 28481126782.
  • Nightly full CI passed for head a92aa81877: workflow_dispatch run 28482149062.

Local notes:

  • make headerscheck is blocked in this local macOS environment by missing Python/Perl/LLVM development headers plus existing full-tree header issues outside this PR surface.
  • No catversion bump: pg_proc.dat tuple content is unchanged.

Risk Boundaries

  • No storage backend, O_DIRECT, or fence-driver surface changes.
  • No persistent format or protocol ABI changes.
  • Does not implement the 6.3 DRM body; this is only the entry lifecycle / safe reclaim prerequisite.
  • Fail-safe behavior favors refusing lookup/reclaim over silent corruption on pin overflow, reclaiming state, over-release, or non-cold entries.
  • Intended to be parallel-safe with 6.0a work because this stays inside GRD/GES lifecycle and observability surfaces.

Review Fixes Addressed

  • H1: removed off-spec global table lock from hot create/reclaim; replaced live HTAB seq scans with shard-local list snapshots.
  • H2: documented per-site F/T/R ERROR cleanup classification and added a real pinned LMD ERROR injection unit test.
  • H3: added two-node cold churn TAP coverage for bounded reclaim and no FULL/53R71-style exhaustion under churn.
  • M4: sweep batch now honors cluster.grd_entry_reclaim_max_per_sweep.
  • L6/L8: removed the dead pinned cleanup reclaim branch and kept explicit SpinLockInit() on slot reuse.

@sqlrush sqlrush force-pushed the spec-6.3a-grd-ges-lifecycle branch from f28a5eb to 2b828ec Compare June 30, 2026 14:19
@sqlrush

sqlrush commented Jun 30, 2026

Copy link
Copy Markdown
Owner Author

Follow-up note from spec-6.3a review, not a new blocker unless the merge owner chooses to tighten the bar before merge:

  1. Before merging PR spec-6.3a: GRD/GES entry lifecycle safe reclaim #13, the only minor worth doing now is M5: add a focused comment/unit test around release_and_drain -> possible cold reclaim -> WFG resync treating a missing/cold entry as no remaining edge. This is low risk and documents the cold=>no-WFG-edge invariant.
  2. After PR spec-6.3a: GRD/GES entry lifecycle safe reclaim #13 lands, before starting the 6.3 DRM body, add deterministic concurrent UAF/ABA stress or fault-injection coverage for same-shard create/reclaim races. Avoid a flaky broad TAP if possible.
  3. After PR spec-6.3a: GRD/GES entry lifecycle safe reclaim #13 lands, consider sweep allocation hardening: chunk large max_per_sweep values or bound allocation by observed entry count; current 65536 setting is acceptable but allocates about 1MB per sweep.

Current CI state when recorded: fast PR checks green and nightly run 28482149062 green at a92aa81. t/296 is included in the nightly 276-326 shard.

@sqlrush sqlrush marked this pull request as ready for review July 1, 2026 00:01
@sqlrush sqlrush merged commit 4a6bcc7 into main Jul 1, 2026
32 of 33 checks passed
@sqlrush

sqlrush commented Jul 1, 2026

Copy link
Copy Markdown
Owner Author

Final merge closure for spec-6.3a:

  • PR spec-6.3a: GRD/GES entry lifecycle safe reclaim #13 is merged to main.
  • Merge commit: 4a6bcc7.
  • Spec branch head included: a92aa81.
  • Post-merge main fast-gate: success, run 28483886997.
  • Post-merge main nightly full CI: success after rerunning failed jobs, run 28484675454. Initial failures were macOS t/234 and Linux stage4-hardgate t/274; both passed on failed-job rerun and were not treated as 6.3a regressions.
  • Release state: merged-to-main / not shipped.
  • No tag, no SHIPPED marker, no release three-sync performed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant