Skip to content

PLT-463: schedule_lag gate + run verdict (stacked on #51)#53

Closed
bdchatham wants to merge 5 commits into
mainfrom
brandon2/plt-463-m15-schedule_lag-gate-run-verdict
Closed

PLT-463: schedule_lag gate + run verdict (stacked on #51)#53
bdchatham wants to merge 5 commits into
mainfrom
brandon2/plt-463-m15-schedule_lag-gate-run-verdict

Conversation

@bdchatham

Copy link
Copy Markdown
Contributor

Implements PLT-463 (M1.5) — the self-check that proves the open-loop fix is actually open-loop.

⚠️ Stacked on #51 (PLT-459): base is the 459 branch so the diff stays clean. Retarget to main once #51 merges.

What

  • schedule_lag = AttemptedSendTime − IntendedSendTime per open-loop tx, recorded into a bounded reservoir (Algorithm R, cap 16384). p99 reported on every run.
  • Verdict: VOID if schedule_lag_p99 > threshold × (1/λ) — named const scheduleLagVoidThreshold = 0.10 ('tune from first calibration run'), overridable via config. VALID otherwise; N/A for closed-loop or ramped-λ (p99 still reported). Logged loudly + run-summary fields (ScheduleLagP99, Verdict, VoidReason).
  • Prewarm / zero-IntendedSendTime excluded; gated on the real ArrivalModel.

Design forks (flagged for review)

  • p99 via bounded reservoir (whole-run representative) vs the tail-trim idiom.
  • λ from cfg.Settings.TPS; ramped-λ runs → N/A (not gated).
  • threshold as config field, not a CLI flag.

Verify

make lint 0 issues · go build · go test -race ./... green.

🤖 Generated with Claude Code

@cursor

cursor Bot commented Jun 15, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes the hot send path (collector lock per tx) and defines pass/fail semantics for benchmark runs; mis-tuned thresholds or wiring bugs could void valid runs or miss generator-bound runs.

Overview
Adds an open-loop self-check that measures whether the load generator kept its own arrival schedule, and labels each run VALID, VOID, or N/A.

Workers record schedule_lag (AttemptedSendTime − IntendedSendTime) when IntendedSendTime is set, into a bounded reservoir sample (Algorithm R) plus exact max and over-bound counters. At shutdown, EvaluateScheduleLag compares p99 to threshold × (1/λ) (default 10%, overridable via scheduleLagVoidThreshold in config) and can also VOID on an unsampled tail fraction (0.5% over bound) when the reservoir p99 would miss late-run degradation. N/A applies for closed-loop, ramped λ, non-fixed TPS, or zero samples; admitted txs with no samples raises an anomaly log.

main arms the VOID bound on fixed-λ open-loop runs, computes the verdict from the actual arrival model, logs it, and extends run summary + OTel gauges (run_schedule_lag_p99, max, over-bound fraction) tagged with verdict.

Reviewed by Cursor Bugbot for commit 7caba67. Bugbot is set up for automated code reviews on this repo. Configure here.

Base automatically changed from brandon2/plt-459-m13-block-indexed-txinclusion-tracker to main June 15, 2026 23:26
bdchatham and others added 4 commits June 15, 2026 16:27
Compute schedule_lag = AttemptedSendTime - IntendedSendTime per open-loop
tx (bounded reservoir, Algorithm R), expose p99 every run, and render a run
VERDICT: VOID when schedule_lag_p99 > threshold x (1/lambda) — a
generator-bound run is void, not a footnote. Threshold is a named const
(0.10, 'tune from first calibration run'), overridable via config. Gated on
the actual arrival model (closed-loop / ramped-lambda => N/A); prewarm and
zero-IntendedSendTime txs excluded.

Stacked on PLT-459 (#51): needs the inclusion run-summary surface.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- RampUp => N/A (checked before TPS): the ramper drives the live limit via
  SetLimit so cfg.Settings.TPS is stale; gating against 1/TPS is wrong.
- Zero samples on a fixed-λ open-loop run => N/A, never VALID (a trust gate
  must not bless 'no data' as a clean run). Thread the admitted count from the
  dispatcher conservation counters; if admitted>0 yet samples==0, log loudly
  (recorder may be mis-wired) and flag Anomaly.
- Drop the redundant inline comment on ScheduleLagVoidThreshold (go-doc keeps
  the rationale).

Cohort: security (false-VALID F1/F2), systems (F2 confirm), idiom (doc dup).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The whole-run p99 (from a uniform reservoir) can dilute a sub-percentile
late-run tail blowup → false VALID. Add an UNSAMPLED over-bound counter
(incremented per recorded send, not sampled) + max lag: VOID when p99 > bound
OR > scheduleLagOverBoundFraction (0.5%, provisional) of sends exceed the
bound, with a distinct reason per criterion. Bound is single-sourced
(ScheduleLagBound) so run-start arming and verdict-time can't drift; armed
only on fixed-λ open-loop runs (inert elsewhere, matching the N/A set).
EvaluateScheduleLag now takes ScheduleLagInputs (kills the adjacent-bool
positional trap). VOID stays advisory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both re-reviewers flagged that the over-bound counter was armed whenever
TPS>0, including ramped open-loop runs (RampUp+TPS>0 is a valid config) — the
verdict is N/A there so it was never a false-VOID, but it emitted a
meaningless over_bound_fraction and contradicted the 'inert on ramped runs'
comment. Gate arming on !RampUp so the counter stays inert exactly where the
verdict is N/A.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham bdchatham force-pushed the brandon2/plt-463-m15-schedule_lag-gate-run-verdict branch from 8beecca to c1da595 Compare June 15, 2026 23:28
Strip bare (PLT-463) self-labels and a was-X-now-Y changelog line, drop a
standalone TODO and a few what-comments. Keep load-bearing why/invariant
comments (reservoir-dilution rationale, Little's-law sizing, registered ⊆
succeeded, negative-lag clamp) and forward-pointing cross-refs. Comment-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham bdchatham closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant