PLT-463: schedule_lag gate + run verdict (stacked on #51) by bdchatham · Pull Request #53 · sei-protocol/sei-load

bdchatham · 2026-06-15T22:36:56Z

Implements PLT-463 (M1.5) — the self-check that proves the open-loop fix is actually open-loop.

⚠️ Stacked on #51 (PLT-459): base is the 459 branch so the diff stays clean. Retarget to main once #51 merges.

What

schedule_lag = AttemptedSendTime − IntendedSendTime per open-loop tx, recorded into a bounded reservoir (Algorithm R, cap 16384). p99 reported on every run.
Verdict: VOID if schedule_lag_p99 > threshold × (1/λ) — named const scheduleLagVoidThreshold = 0.10 ('tune from first calibration run'), overridable via config. VALID otherwise; N/A for closed-loop or ramped-λ (p99 still reported). Logged loudly + run-summary fields (ScheduleLagP99, Verdict, VoidReason).
Prewarm / zero-IntendedSendTime excluded; gated on the real ArrivalModel.

Design forks (flagged for review)

p99 via bounded reservoir (whole-run representative) vs the tail-trim idiom.
λ from cfg.Settings.TPS; ramped-λ runs → N/A (not gated).
threshold as config field, not a CLI flag.

Verify

make lint 0 issues · go build · go test -race ./... green.

🤖 Generated with Claude Code

cursor · 2026-06-15T22:37:00Z

PR Summary

Medium Risk
Changes the hot send path (collector lock per tx) and defines pass/fail semantics for benchmark runs; mis-tuned thresholds or wiring bugs could void valid runs or miss generator-bound runs.

Overview
Adds an open-loop self-check that measures whether the load generator kept its own arrival schedule, and labels each run VALID, VOID, or N/A.

Workers record schedule_lag (AttemptedSendTime − IntendedSendTime) when IntendedSendTime is set, into a bounded reservoir sample (Algorithm R) plus exact max and over-bound counters. At shutdown, EvaluateScheduleLag compares p99 to threshold × (1/λ) (default 10%, overridable via scheduleLagVoidThreshold in config) and can also VOID on an unsampled tail fraction (0.5% over bound) when the reservoir p99 would miss late-run degradation. N/A applies for closed-loop, ramped λ, non-fixed TPS, or zero samples; admitted txs with no samples raises an anomaly log.

main arms the VOID bound on fixed-λ open-loop runs, computes the verdict from the actual arrival model, logs it, and extends run summary + OTel gauges (run_schedule_lag_p99, max, over-bound fraction) tagged with verdict.

^{Reviewed by Cursor Bugbot for commit 7caba67. Bugbot is set up for automated code reviews on this repo. Configure here.}

Compute schedule_lag = AttemptedSendTime - IntendedSendTime per open-loop tx (bounded reservoir, Algorithm R), expose p99 every run, and render a run VERDICT: VOID when schedule_lag_p99 > threshold x (1/lambda) — a generator-bound run is void, not a footnote. Threshold is a named const (0.10, 'tune from first calibration run'), overridable via config. Gated on the actual arrival model (closed-loop / ramped-lambda => N/A); prewarm and zero-IntendedSendTime txs excluded. Stacked on PLT-459 (#51): needs the inclusion run-summary surface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- RampUp => N/A (checked before TPS): the ramper drives the live limit via SetLimit so cfg.Settings.TPS is stale; gating against 1/TPS is wrong. - Zero samples on a fixed-λ open-loop run => N/A, never VALID (a trust gate must not bless 'no data' as a clean run). Thread the admitted count from the dispatcher conservation counters; if admitted>0 yet samples==0, log loudly (recorder may be mis-wired) and flag Anomaly. - Drop the redundant inline comment on ScheduleLagVoidThreshold (go-doc keeps the rationale). Cohort: security (false-VALID F1/F2), systems (F2 confirm), idiom (doc dup). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The whole-run p99 (from a uniform reservoir) can dilute a sub-percentile late-run tail blowup → false VALID. Add an UNSAMPLED over-bound counter (incremented per recorded send, not sampled) + max lag: VOID when p99 > bound OR > scheduleLagOverBoundFraction (0.5%, provisional) of sends exceed the bound, with a distinct reason per criterion. Bound is single-sourced (ScheduleLagBound) so run-start arming and verdict-time can't drift; armed only on fixed-λ open-loop runs (inert elsewhere, matching the N/A set). EvaluateScheduleLag now takes ScheduleLagInputs (kills the adjacent-bool positional trap). VOID stays advisory. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Both re-reviewers flagged that the over-bound counter was armed whenever TPS>0, including ramped open-loop runs (RampUp+TPS>0 is a valid config) — the verdict is N/A there so it was never a false-VOID, but it emitted a meaningless over_bound_fraction and contradicted the 'inert on ramped runs' comment. Gate arming on !RampUp so the counter stays inert exactly where the verdict is N/A. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Strip bare (PLT-463) self-labels and a was-X-now-Y changelog line, drop a standalone TODO and a few what-comments. Keep load-bearing why/invariant comments (reservoir-dilution rationale, Little's-law sizing, registered ⊆ succeeded, negative-lag clamp) and forward-pointing cross-refs. Comment-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Base automatically changed from brandon2/plt-459-m13-block-indexed-txinclusion-tracker to main June 15, 2026 23:26

bdchatham and others added 4 commits June 15, 2026 16:27

bdchatham force-pushed the brandon2/plt-463-m15-schedule_lag-gate-run-verdict branch from 8beecca to c1da595 Compare June 15, 2026 23:28

bdchatham closed this Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLT-463: schedule_lag gate + run verdict (stacked on #51)#53

PLT-463: schedule_lag gate + run verdict (stacked on #51)#53
bdchatham wants to merge 5 commits into
mainfrom
brandon2/plt-463-m15-schedule_lag-gate-run-verdict

bdchatham commented Jun 15, 2026

Uh oh!

cursor Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Jun 15, 2026

What

Design forks (flagged for review)

Verify

Uh oh!

cursor Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 15, 2026 •

edited

Loading