diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..23a668a --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,43 @@ +# sei-load — agent guide + +sei-load drives synthetic transaction load at a Sei EVM endpoint and **emits measurements** (counters, histograms, a run summary) about how the system under test (SUT) responds. It is a load generator and a measurement instrument — **not** a judge: it computes no pass/fail verdicts, percentiles, or SLO compliance; you derive those externally from the signals it emits. These docs are the operating manual for an agent that designs, runs, and interprets sei-load experiments and acts on the results. + +## Start here (reading order for a new agent) + +Read linearly the first time: + +1. [docs/01-mental-model.md](docs/01-mental-model.md) — the pipeline, open- vs closed-loop, coordinated omission, and the measure-not-judge philosophy. **Read this first.** +2. [docs/02-running.md](docs/02-running.md) — build the binary, every CLI flag, the run lifecycle. +3. [docs/03-config-reference.md](docs/03-config-reference.md) — the JSON config schema behind those flags. +4. [docs/04-workload-model.md](docs/04-workload-model.md) — scenarios and the StorageRW contention/size/op axes. +5. [docs/06-measurement-metrics.md](docs/06-measurement-metrics.md) — the authoritative metric catalog and the conservation model; the PromQL you query. +6. [docs/05-reproducibility.md](docs/05-reproducibility.md) — seed → sub-stream determinism and fair A/B setup. **Read before 07:** the playbook's cardinal rule depends on the seed/fair-A/B mechanics defined here. +7. [docs/07-experiment-playbook.md](docs/07-experiment-playbook.md) — objective → knobs → read → interpret recipes. + +Keep [docs/08-limits-boundaries.md](docs/08-limits-boundaries.md) as a reference — pull it in when a non-zero boundary counter forces you to discount a result. + +## Table of contents + +| Doc | Covers / when you need it | +|-----|---------------------------| +| [01-mental-model.md](docs/01-mental-model.md) | The send pipeline, open-loop vs closed-loop arrival, coordinated omission, conservation identities, and why the tool emits signal not verdicts. The conceptual floor — read before anything else. | +| [02-running.md](docs/02-running.md) | Building/invoking `seiload`, every CLI flag, settings precedence, the metrics endpoint, copy-pasteable invocations, and the run lifecycle. Need it when starting/stopping/reproducing a run. | +| [03-config-reference.md](docs/03-config-reference.md) | The complete JSON config schema — `LoadConfig`, `settings`, `scenarios`, `accounts`, `funding`, gotchas. Need it when authoring or editing a config. | +| [04-workload-model.md](docs/04-workload-model.md) | The scenario set, what each stresses, and the StorageRW key-contention / tx-size / op-mix axes plus what they probe on Sei's parallel executor. Need it when choosing a scenario and shaping load. | +| [05-reproducibility.md](docs/05-reproducibility.md) | Seed → sub-stream derivation, the exact determinism guarantee, fair A/B setup, open-loop determinism under drops. Need it before comparing two runs. | +| [06-measurement-metrics.md](docs/06-measurement-metrics.md) | The authoritative 19-instrument catalog, the conservation model, and the PromQL recipes for rates/percentiles/goodput/validity. Need it before writing any query or trusting a number. | +| [07-experiment-playbook.md](docs/07-experiment-playbook.md) | The reasoning layer: objective → knobs → validity → read → interpret → next move, with recipes for contention, size, and tail-latency experiments. Need it when designing a run. | +| [08-limits-boundaries.md](docs/08-limits-boundaries.md) | The accepted measurement boundaries (WS gaps, reorgs, single fetch endpoint, header-arrival clock, cap drops) and the counter to check for each. Need it when deciding whether a non-zero counter invalidates a conclusion. | + +## Fastest path to a first experiment + +1. Build and validate offline: `make build`, then a `--dry-run` invocation — see [docs/02-running.md](docs/02-running.md#common-invocations). +2. Run an open-loop, fixed-λ measurement with receipt tracking and follow the trustworthy-tail-latency recipe — see [docs/07-experiment-playbook.md](docs/07-experiment-playbook.md) §4, then validity-gate it with §5 before quoting any number. + +## Standing caveats (true on `main` today) + +- **StorageRW distribution/size/op axes require PLT-465 (#54, unmerged).** `keyDistribution`, `sizeDistribution`, `sizeBuckets`, `recordCount`, and `operations` parse but **do not affect generated transactions** on main — StorageRW emits a fixed scaffold (slot 0, empty pad, all-`rmw`). See [docs/04-workload-model.md](docs/04-workload-model.md). +- **`schedule_lag` is a concept, not a queryable metric** (emitter punted as PLT-463). Judge generator validity externally via the [06 §3.4](docs/06-measurement-metrics.md) heuristics. +- **`--report-path` writes a formatted text dump, not JSON** (schema-versioned JSON is PLT-467). The seed is **config-file-only** (no `--seed` flag). +- **Exported series carry a `seiload_` prefix and unit suffixes.** The Prometheus exporter sets `WithNamespace("seiload")` (configurable), so every series is prefixed `seiload_`, and OTel appends unit suffixes (`s`-unit → `_seconds`, etc.); histograms expose `_bucket`/`_sum`/`_count` and counters end `_total`. The wire names — not the instrument base names — are what you query; [docs/06](docs/06-measurement-metrics.md) §2 lists them. +- **The tool emits signal, not verdicts.** Every rate, percentile, and pass/fail is computed by you. diff --git a/README.md b/README.md index 78c97fd..e169c14 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,5 @@ +> 🤖 For agent-driven experiments, start at [AGENTS.md](AGENTS.md) and docs/ — the authoritative, current operating docs. This README is a human quick-start and may lag. + # sei-load [![Tests](https://github.com/sei-protocol/sei-load/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/sei-protocol/sei-load/actions/workflows/build-and-test.yml) diff --git a/docs/01-mental-model.md b/docs/01-mental-model.md new file mode 100644 index 0000000..701f373 --- /dev/null +++ b/docs/01-mental-model.md @@ -0,0 +1,178 @@ +# Mental Model + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers / when an agent needs it. The conceptual foundation you must +> hold before designing, running, or interpreting a sei-load experiment: the send +> pipeline, the open-loop arrival model and the coordinated-omission problem it +> solves, where verdicts come from (not the tool), and the load-bearing +> vocabulary. Read this first; the config and metric specifics are in the sibling +> docs linked at the end. + +## What sei-load is + +sei-load drives synthetic transaction load at a Sei EVM endpoint and **emits +measurements** about how the system under test (SUT) responds. It is a load +generator and a measurement instrument. It is **not** a judge: it does not +compute pass/fail verdicts or SLO compliance (see [Measurement philosophy](#measurement-philosophy)). + +## The send pipeline + +A transaction flows through a fixed pipeline: + +``` +generator → dispatcher → sharded sender → per-endpoint workers → Sei RPC +``` + +- **Generator** (`generator.Generator`) produces `*types.LoadTx` values. Each + `Generate()` call draws from the seeded PRNG sub-streams (accounts, gas, key/ + size distributions) — this is the only place workload randomness is consumed. +- **Dispatcher** (`sender.Dispatcher`) owns the arrival timing. It runs in one of + two arrival models (below) and hands each tx to the sender. +- **Sharded sender** (`sender.ShardedSender`, satisfies `sender.TxSender`) routes + each tx to one of N per-endpoint workers by shard. `Send` enqueues into the + worker's channel and returns immediately — it is asynchronous. +- **Workers** (`sender.Worker`) each own one RPC client to one endpoint and run + `Tasks` send goroutines over a shared channel. The send goroutine stamps + `AttemptedSendTime`, then calls go-ethereum `eth_sendRawTransaction` + **synchronously**. +- **Sei RPC** is the SUT. The send returns nil (accepted) or an error (rejected). + +A single shared `golang.org/x/time/rate.Limiter` is the one rate authority for +the whole pipeline. In closed-loop the worker gates on it; in open-loop the +scheduler reads it as a clock source (see below). When ramping is enabled, a +`Ramper` drives the limiter's limit up or down via `SetLimit`. + +Optionally, when `--track-receipts` is set, successful sends are handed to a +block-indexed `stats.InclusionTracker` that observes on-chain inclusion by +scanning arriving blocks (O(blocks), not per-tx receipt polling). See +[06-measurement-metrics.md](06-measurement-metrics.md). + +## The arrival model: why open-loop exists + +The dispatcher supports two arrival models, selected by `arrivalModel` +(`sender.ArrivalModel`, values `"closed_loop"` / `"open_loop"`). + +### Coordinated omission (the problem) + +In the legacy **closed-loop** model the dispatcher generates the next tx only +once a sender is free (`runClosedLoop`: generate-then-send in lockstep). The +dequeue clock is therefore the SUT's clock: **when the SUT slows, the generator +slows with it and simply stops issuing the requests that would have observed the +slowdown.** The latency histogram under-reports, because the worst-affected +requests were never sent. This is **coordinated omission** — the closed-loop +model lies about latency precisely when the answer matters most (under stress). + +### Open-loop (the fix) + +The **open-loop** model decouples the arrival clock from sender availability +(`sender.openLoopScheduler`). Transaction `i` is scheduled at a fixed instant +**`t₀ + i/λ`**, where `t₀` is the run start and `λ` is the target rate, regardless +of whether any sender is free. + +Properties that make it honest: + +- **Absolute-instant scheduling.** The scheduler sleeps until each absolute + instant (`SleepUntil(nextSend)`), not for a relative gap, so per-tx scheduling + slop cannot accumulate into clock drift over a long run. +- **λ as a clock, not a gate.** λ is sampled from the shared limiter on each step + (`limiter.Limit()`), so a ramping rate is honored; at fixed λ the running sum + telescopes to exactly `t₀ + i/λ`. The limiter is read here as a clock source — + the schedule advances whether or not the SUT keeps up. +- **Bounded in-flight + drop-and-count.** The arrival clock is **never throttled + by backpressure** (throttling would reintroduce coordinated omission). Instead + a counting semaphore bounds true in-flight sends to `maxInFlight`. At each + scheduled instant the scheduler does a non-blocking `TryAcquire`: if senders are + saturated the tick is **dropped and counted** and the clock moves on. The permit + is held across the full unacked-in-flight window (enqueue + RPC round-trip) and + released only after the synchronous send returns (via `tx.OnComplete`), so + `maxInFlight` bounds real in-flight work and the drop count measures genuine + load shed, not buffer geometry. +- **Admit before generate.** The permit is acquired **before** the generator is + drawn. A dropped tick draws no tx (no seeded-stream consumption, no signer CPU), + which makes admitted txs a deterministic prefix of the seeded sequence — see + [05-reproducibility.md](05-reproducibility.md). + +Closed-loop is retained only as the **legacy regression baseline**. For any +experiment where tail latency under load matters, use open-loop. + +To use open-loop: set `arrivalModel: "open_loop"` and a finite positive rate +(`tps > 0` or `rampUp: true`); validation rejects open-loop with no finite λ. +See [03-config-reference.md](03-config-reference.md). + +### Conservation (how counts must add up) + +Every scheduled tick reaches exactly one terminal state, and the dispatcher folds +these into the run summary: + +``` +scheduled = dropped + admitted +admitted = succeeded + failed +``` + +- **dropped** — shed because in-flight was saturated at the scheduled instant + (never admitted, never sent). +- **admitted** — took a permit and drew a tx. +- **succeeded** — admitted, send returned nil (`DispatcherStats.TotalSent`). +- **failed** — admitted, send returned an error. **Counted, never lost** + (`DispatcherStats.Failed`); a send error does not tear down the run. + +In closed-loop, `Failed` and `Dropped` are always 0. + +A finite workload ends when the generator drains; the terminal probe that +discovers this advances neither clock, index, nor counters. On a clean drain +`admitted == succeeded + failed` holds exactly. On `ctx` cancel (SIGTERM / +duration limit) some admitted txs may still be buffered for a worker and exit +uncounted — a bounded undercount that never affects a cleanly completed run. + +## Measurement philosophy + +**The generator emits measurements; it does not pronounce verdicts.** SLO +judgments, A/B comparisons, and pass/fail decisions are computed **externally** +via metric queries against the telemetry the tool emits — they are not owned by +sei-load. This shapes how you consume outputs: + +- Treat sei-load output as raw signal (counters, histograms, the run summary), + not as a graded result. +- Build your verdict logic in your query/analysis layer, gating on the run-level + arrival model (see next point). +- **A tx cannot self-describe which model produced it.** An open-loop and a + closed-loop `LoadTx` are byte-identical; coordinated-omission safety is a + property of the run's arrival model, not of any per-tx field. Latency and + schedule-lag consumers **must gate on the run-level `arrivalModel`** before + trusting a latency or schedule-lag sample. In closed-loop, `IntendedSendTime` + is merely the back-pressured enqueue time, so derived latency is omitted / + meaningless. + +> **`schedule_lag` is a concept, not a metric on main today.** It is the +> coordinated-omission/validity quantity `AttemptedSendTime − IntendedSendTime`, +> computed and judged **externally** — there is no `schedule_lag` series on +> `/metrics` (the emitter was punted as PLT-463). Do not write a query against it; +> see [06-measurement-metrics.md](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag) +> for the external validity heuristics that stand in for it. + +## Glossary + +| Term | Meaning | +|---|---| +| **λ (lambda)** | Target arrival rate (tx/s). In open-loop, sampled from the shared limiter each step as a clock source; the inter-arrival gap is `1/λ`. | +| **t₀** | Run start instant; the anchor for the open-loop schedule. | +| **intended send time** | `IntendedSendTime` = `t₀ + i/λ`, the true scheduled instant (open-loop). In closed-loop it is the enqueue time instead — not a real schedule. | +| **attempted send time** | `AttemptedSendTime`, the wall clock when a worker actually called the RPC. | +| **inclusion time** | `InclusionTime`, the header-arrival wall clock of the block that included the tx (set only when `--track-receipts`). | +| **schedule_lag** | `AttemptedSendTime − IntendedSendTime`. The primary coordinated-omission gate: it shows sends falling behind the arrival schedule even before any tx is shed. Open-loop only. **A concept, not a metric on main** — computed/judged externally; not a queryable series (emitter punted as PLT-463). | +| **SequenceIndex** | The arrival-tick index `i`. Monotonic; under drops it is non-contiguous across admitted txs (dropped ticks advance `i` and the clock but consume no draw). | +| **admitted** | A tick that took an in-flight permit and drew a tx. | +| **dropped** | A tick shed because in-flight was saturated (drop-and-count). | +| **failed** | An admitted tx whose send returned an error (counted, not lost). | +| **in-flight** | Concurrent unacked sends, bounded by `maxInFlight` via the semaphore; a permit is held enqueue → RPC return. | +| **drop-and-count** | The open-loop overload policy: shed and tally overdue ticks rather than throttle the arrival clock. | + +## See also + +- [02-running.md](02-running.md) — invoking a run. +- [03-config-reference.md](03-config-reference.md) — every config/CLI setting. +- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts. +- [05-reproducibility.md](05-reproducibility.md) — seeds, sub-streams, A/B. +- [06-measurement-metrics.md](06-measurement-metrics.md) — emitted metrics and the run summary. +- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for common experiments. diff --git a/docs/02-running.md b/docs/02-running.md new file mode 100644 index 0000000..a524963 --- /dev/null +++ b/docs/02-running.md @@ -0,0 +1,186 @@ +# Running sei-load + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers / when an agent needs it: how to build and invoke the `seiload` +> binary, every CLI flag and its effect, settings precedence, the metrics +> endpoint, a ladder of copy-pasteable invocations, and the run lifecycle +> (prewarm → run → run-summary → flush). Read this when you are about to start, +> stop, or reproduce a run. For the meaning of config-file fields, see +> [03-config-reference.md](03-config-reference.md). + +## Build and run + +```bash +make build # produces ./build/seiload +./build/seiload --config # --config is REQUIRED; run fails fast without it +``` + +The binary reads one JSON config file (`--config`/`-c`), resolves settings (CLI > +config > defaults), validates them, then runs until its duration elapses or it +receives `SIGTERM`/`SIGINT`. + +A run with no endpoints or no scenarios in the config is rejected at load time +(`no endpoints specified` / `no scenarios specified`). + +### Metrics endpoint + +Prometheus metrics are served at `http:///metrics` (default +`0.0.0.0:9090`). OpenMetrics is enabled so exemplars survive scraping. Point your +scraper here; the run holds the process open for a scrape window after it +finishes (see [Lifecycle](#lifecycle)). To export traces/OTLP, set +`OTEL_EXPORTER_OTLP_ENDPOINT` in the environment. + +## CLI flags + +Every flag below maps 1:1 to a `settings` field (same default) except `--config`, +`--nodes`, and `--metricsListenAddr`, which are CLI-only. Flag defaults come from +`DefaultSettings()`; config-file values override these defaults, and CLI flags +override the config file. + +| Flag | Short | Default | Meaning / effect | +|------|-------|---------|------------------| +| `--config` | `-c` | (required) | Path to the JSON config file. No default; run aborts if unset. | +| `--workers` | `-w` | `1` | Tasks (workers) **per endpoint**. Total senders = workers × endpoints. | +| `--tps` | `-t` | `0` | Target transactions/sec, shared across all workers (single rate limiter). `0` = no limit. Required (>0) for open-loop unless `--ramp-up`. | +| `--arrival-model` | | `closed_loop` | `open_loop` schedules tx *i* at t₀+i/λ and drops overdue txs; `closed_loop` is the legacy generate-then-send lockstep. See [03](03-config-reference.md#arrivalmodel). | +| `--max-in-flight` | | `10000` | **Open-loop only.** Max concurrent in-flight sends; txs that would exceed this at their scheduled instant are dropped and counted (the clock is never throttled). Ignored in closed-loop. | +| `--stats-interval` | `-s` | `10s` | Interval for logging throughput/latency stats and the user-latency tracker tick. | +| `--buffer-size` | `-b` | `1000` | Channel buffer size per worker. Larger = more in-memory queueing; reduce under memory pressure. | +| `--dry-run` | | `false` | Simulate generation/sending without hitting the chain. Forces `mockDeploy`. Disables the inclusion tracker (simulated sends never land, would all reap as expired). | +| `--debug` | | `false` | Log each transaction. High-volume; for small/diagnostic runs only. | +| `--track-receipts` | | `false` | Enable the block-indexed tx→inclusion tracker (stamps inclusion time; reports included/expired/dropped-at-cap/inflight-at-shutdown). No-op under `--dry-run` or with zero endpoints. | +| `--inclusion-reap-after` | | `30s` | How long an un-included tx stays in the inclusion registry before being reaped as **expired**. Tune to expected inclusion time on congested chains. Only meaningful with `--track-receipts`. | +| `--track-blocks` | | `false` | Collect block statistics (block time, gas) from `endpoints[0]`. | +| `--track-user-latency` | | `false` | Track per-user latency from `endpoints[0]`, sampled at `--stats-interval`. | +| `--prewarm` | | `false` | Prewarm accounts with self-transactions before the main run (warms nonces/state; excluded from main stats). | +| `--ramp-up` | | `false` | Drive load with a built-in ramp curve instead of a fixed rate. Provides a finite λ for open-loop without a fixed `--tps`. Curve is fixed in code: start 100 TPS, +100 per step, 120s load interval, 30s recovery interval. | +| `--report-path` | | `""` | Write a **formatted text** report to this path (`/dev/stdout` is valid). Empty = no report file. Note: a text dump today, **not** JSON — schema-versioned run-summary JSON is future work (PLT-467). See [06 §4.2](06-measurement-metrics.md#42---report-path-file--stdout-final-stats). | +| `--txs-dir` | | `""` | Write generated transactions to this dir instead of sending them (offline tx-writer mode). Forces closed-loop; open-loop is ignored with a logged downgrade. | +| `--target-gas` | | `10000000` | Target gas per block (tx-writer mode). | +| `--num-blocks-to-write` | | `100` | Number of blocks to write (tx-writer mode). | +| `--duration` | | `0` | Run duration. `0` = run until `SIGTERM`/`SIGINT`. | +| `--post-summary-flush-delay` | | `25s` | In-process sleep AFTER the run-summary metrics are recorded, so Prometheus can scrape final values before exit. Set `0` to exit immediately (you lose the final scrape). | +| `--nodes` | `-n` | `0` | Limit to the first N endpoints from the config. `0` = use all. | +| `--metricsListenAddr` | | `0.0.0.0:9090` | `ip:port` for the Prometheus `/metrics` endpoint. | + +> Trackers that read chain state (`--track-blocks`, `--track-user-latency`, +> `--track-receipts`, and the ramper's block collector) all read from +> `endpoints[0]` only. Put a representative/stable RPC first. + +> **No `--seed` flag.** The seed is **config-file-only** (top-level `seed`, +> `LoadConfig.Seed *uint64`). To pin or replay a workload, set `seed` in the config +> file — there is no CLI override. See +> [05-reproducibility.md](05-reproducibility.md#setting-the-seed). + +> **`seiChainID` casing is cosmetic only.** The struct tag is `seiChainID` (capital +> `ID`), and several shipped profiles write `seiChainId` (lowercase `d`). Go's +> `encoding/json` matches tags **case-insensitively**, so `seiChainId` binds to the +> same field — `chain_id` is populated and `chain_id`-keyed PromQL works either way. +> Prefer `seiChainID` for style consistency, but it does **not** affect binding or +> queries. See [03 gotchas](03-config-reference.md#gotchas). + +## Settings precedence + +``` +CLI flag > config-file "settings" > built-in default +``` + +Resolution is via viper: defaults are seeded from `DefaultSettings()`, the config +file's `settings` block is merged, then bound CLI flags override. A field absent +everywhere falls back to its default. After resolution the settings are validated +(`Settings.Validate`) and the run aborts on an invalid combination — notably +`arrival-model open_loop` with no finite rate (`--tps<=0` and not `--ramp-up`). + +## Common invocations + +Minimal → realistic. + +**1. Validate a config without touching the chain (dry-run):** +```bash +./build/seiload --config profiles/local.json --dry-run --debug +``` +Generates and logs transactions; deploys are mocked; no sends. Use this to +confirm scenarios, accounts, and weights resolve before a real run. + +**2. Closed-loop, fixed TPS (legacy baseline):** +```bash +./build/seiload --config profiles/local.json --workers 50 --tps 100 +``` +Workers generate then send in lockstep; the shared limiter caps aggregate rate at +100 TPS. Susceptible to coordinated omission — prefer open-loop for latency +claims. + +**3. Open-loop, fixed λ (coordinated-omission-correct):** +```bash +./build/seiload --config profiles/local.json \ + --arrival-model open_loop --tps 100 --max-in-flight 5000 +``` +Arrivals are scheduled at t₀+i/λ independent of sender availability; if in-flight +hits `--max-in-flight` the overdue tx is dropped and counted (reported as +`Open-loop dropped N txs` at exit) rather than slowing the clock. + +**4. Ramped run (open-loop, no fixed TPS):** +```bash +./build/seiload --config profiles/local.json --arrival-model open_loop --ramp-up +``` +The ramper supplies a finite, increasing λ to the shared limiter — this satisfies +open-loop's "finite positive rate" requirement without `--tps`. Final ramp stats +are logged at exit. + +**5. Run with inclusion + block tracking:** +```bash +./build/seiload --config profiles/arctic-1.json \ + --track-receipts --track-blocks --inclusion-reap-after 45s +``` +Stamps each sent tx and matches it against on-chain blocks from `endpoints[0]`; +at exit reports `included / expired / dropped_at_cap / inflight_at_shutdown`. On +a congested chain raise `--inclusion-reap-after` so slow-but-real inclusions are +not miscounted as expired. + +**6. Limit endpoints with `--nodes`:** +```bash +./build/seiload --config profiles/local_docker.json --nodes 2 +``` +Uses only the first 2 of the config's endpoints. Useful to A/B fan-out without +editing the config. + +**7. Bounded duration vs. signal-driven:** +```bash +./build/seiload --config profiles/local.json --tps 100 --duration 5m # stops after 5m +./build/seiload --config profiles/local.json --tps 100 # runs until Ctrl-C / SIGTERM +``` + +## Lifecycle + +A run proceeds in this order: + +1. **Load + resolve + validate** config and settings; abort fast on bad combos. +2. **Setup**: start the metrics server, observability, block/user-latency/inclusion + trackers (per flags), and connect the sharded sender. +3. **Fund** the account pool (only if `funding` is set and not `--dry-run`). +4. **Prewarm** (if `--prewarm`): self-transactions warm accounts; excluded from + main stats (the stats logger starts *after* prewarm). +5. **Run**: dispatcher drives the workload (open- or closed-loop) under the shared + rate limiter; stats logged every `--stats-interval`. +6. **End**: the run stops when `--duration` elapses (context timeout) or a + `SIGTERM`/`SIGINT` arrives. Workers and trackers drain and join. +7. **Run summary**: final stats are logged, the inclusion conservation identity is + read after join (so `inflight_at_shutdown` is final), and a run-summary metric + is emitted (`arrival_model`, `dropped`, `failed`, inclusion counts). +8. **Flush window**: the process sleeps `--post-summary-flush-delay` (default + `25s`) so Prometheus can scrape the final summary, then exits cleanly. A + `context.Canceled`/`DeadlineExceeded` from a clean duration/signal stop is + treated as success (exit 0). + +> If you scrape final summary metrics, the scrape interval must be shorter than +> `--post-summary-flush-delay`, or set the delay higher. Setting it to `0` exits +> immediately and the last scrape is lost. + +## See also + +- [01-mental-model.md](01-mental-model.md) — what sei-load is and how its pieces fit. +- [03-config-reference.md](03-config-reference.md) — the full config schema these flags mirror. +- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts. +- [06-measurement-metrics.md](06-measurement-metrics.md) — what the metrics/summary mean. +- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for reproducible experiments. diff --git a/docs/03-config-reference.md b/docs/03-config-reference.md new file mode 100644 index 0000000..0051d0e --- /dev/null +++ b/docs/03-config-reference.md @@ -0,0 +1,262 @@ +# Config reference + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers / when an agent needs it: the complete JSON config schema for +> sei-load — top-level `LoadConfig`, the `settings` block (every field, type, +> default, and run effect), `scenarios`, `accounts`, and `funding` — with an +> annotated example and the field interactions that change a run's behavior. Read +> this when authoring or editing a config. For how to invoke the binary and the +> CLI-flag equivalents, see [02-running.md](02-running.md). + +The config is a single JSON object parsed into `config.LoadConfig`. Every field is +optional except `endpoints` and `scenarios` (the loader rejects a config missing +either). Unknown fields are ignored, and **a config that uses no new fields runs +the legacy closed-loop path unchanged** — the schema is additive by construction. + +## Top-level `LoadConfig` + +| Field | JSON key | Type | Default | Meaning | +|-------|----------|------|---------|---------| +| ChainID | `chainId` | int64 | `0` | EVM chain ID used to sign transactions. Must match the target chain. | +| SeiChainID | `seiChainID` | string | `""` | Textual chain ID used to tag metrics and block/inclusion collectors. Key casing is cosmetic — `seiChainId` (lowercase `d`) also binds (see [gotcha](#gotchas)). | +| Endpoints | `endpoints` | []string | (required) | RPC endpoints. Workers shard across all of them; trackers read only `endpoints[0]`. | +| Accounts | `accounts` | object | none | Shared account pool (see [Accounts](#accounts)). | +| Scenarios | `scenarios` | []object | (required) | Weighted workload mix (see [Scenarios](#scenarios)). | +| Settings | `settings` | object | `DefaultSettings()` | Run knobs; CLI flags override (see [Settings](#settings)). | +| Funding | `funding` | object | none | Root-key funding of the account pool (see [Funding](#funding)). | +| Seed | `seed` | uint64 | random | Roots the deterministic PRNG. Same seed + config = same workload draw multiset (see [Seed](#seed-reproducibility)). | +| MockDeploy | `mockDeploy` | bool | `false` | Mock contract deploys. Auto-forced on under `--dry-run`; rarely set by hand. | +| ReportPath | `reportPath` | string | `""` | Alias also accepted at top level; `settings.reportPath` is the normal place. | + +### Annotated example + +```jsonc +{ + "chainId": 713715, // EVM chain id; must match the chain + "seiChainID": "arctic-1", // metric/collector tag (casing cosmetic; seiChainId also binds) + "endpoints": [ // workers shard across these; trackers use [0] + "http://rpc-a:8545", + "http://rpc-b:8545" + ], + "seed": 42, // optional; omit for a random (recorded) seed + "accounts": { // shared pool unless a scenario overrides + "count": 500, + "newAccountRate": 0.0 + }, + "funding": { // optional; fund the pool from a root key + "rootKeyFile": "/etc/seiload-key/root-key.hex", + "fundAmountWei": "1000000000000000000", // 1 SEI; decimal STRING (precision) + "batchSize": 200 + }, + "scenarios": [ + { "name": "EVMTransfer", "weight": 7 }, + { "name": "ERC20", "weight": 3 } + ], + "settings": { + "workers": 50, + "tps": 100, + "arrivalModel": "open_loop", + "maxInFlight": 5000, + "statsInterval": "10s", + "bufferSize": 1000, + "trackReceipts": true, + "inclusionReapAfter": "45s", + "trackBlocks": true, + "prewarm": true, + "postSummaryFlushDelay": "25s", + "reportPath": "/dev/stdout" + } +} +``` + +## Settings + +Every field below has a CLI-flag twin with the same default (see +[02-running.md](02-running.md#cli-flags)); CLI overrides config overrides default. +Duration fields are JSON strings parsed by Go's `time.ParseDuration` (e.g. +`"10s"`, `"45s"`, `"5m"`). + +| Field (JSON key) | Type | Default | Effect on the run | +|------------------|------|---------|-------------------| +| `workers` | int | `1` | Tasks per endpoint. Total senders = workers × endpoints. (Struct field is `TasksPerEndpoint`.) | +| `tps` | float64 | `0` | Aggregate target rate via one shared limiter. `0` = unbounded. Required (>0) for open-loop unless `rampUp`. | +| `arrivalModel` | string | `"closed_loop"` | `"open_loop"` vs `"closed_loop"` — see [arrivalModel](#arrivalmodel). | +| `maxInFlight` | int | `10000` | **Open-loop only.** Cap on concurrent in-flight sends; overdue txs past the cap are dropped+counted, the arrival clock is never throttled. Ignored in closed-loop. | +| `statsInterval` | duration | `"10s"` | Stats-logging cadence; also the user-latency tracker tick. | +| `inclusionReapAfter` | duration | `"30s"` | Time an un-included tx waits before being reaped as **expired**. Only used when `trackReceipts` is on. Too short → real-but-slow inclusions counted expired; too long → inflated in-flight map. Also sizes the inclusion registry cap (≈ tps × reapAfter × 1.5, floored at maxInFlight × 4). | +| `bufferSize` | int | `1000` | Per-worker channel buffer. Larger = more in-memory queueing; lower under memory pressure. | +| `dryRun` | bool | `false` | Simulate without sending; forces `mockDeploy`; disables the inclusion tracker. | +| `debug` | bool | `false` | Log every transaction. Diagnostic/small runs only. | +| `trackReceipts` | bool | `false` | Enable the block-indexed inclusion tracker (included/expired/dropped-at-cap/inflight-at-shutdown). No-op under `dryRun` or with zero endpoints. Reads `endpoints[0]`. | +| `trackBlocks` | bool | `false` | Collect block time/gas stats from `endpoints[0]`. | +| `trackUserLatency` | bool | `false` | Per-user latency sampled at `statsInterval` from `endpoints[0]`. | +| `prewarm` | bool | `false` | Self-transaction prewarm before the main run; excluded from main stats. | +| `rampUp` | bool | `false` | Drive load with the built-in ramp curve. Supplies a finite λ to satisfy open-loop without a fixed `tps`. | +| `reportPath` | string | `""` | Write a **formatted text** report to this path (`/dev/stdout` valid); empty = none. Text dump today, not JSON — schema-versioned JSON is future work (PLT-467). | +| `txsDir` | string | `""` | Offline tx-writer mode: write generated txs to this dir instead of sending. Forces closed-loop (open-loop logged as ignored). | +| `targetGas` | uint64 | `10000000` | Target gas/block in tx-writer mode. | +| `numBlocksToWrite` | int | `100` | Blocks to write in tx-writer mode. | +| `postSummaryFlushDelay` | duration | `"25s"` | Post-summary sleep so Prometheus scrapes final metrics before exit. `0` = exit immediately (last scrape lost). | + +> CLI-only (not in `settings`): `--config`, `--nodes`, `--metricsListenAddr`. + +### `arrivalModel` + +The single field that most changes a run's semantics. + +- **`closed_loop`** (default) — legacy generate-then-send lockstep. Each worker + generates a tx, sends it, then generates the next; throughput is bounded by + sender latency. Susceptible to **coordinated omission** (slow sends suppress + arrivals, hiding tail latency). `maxInFlight` is ignored. Keep as the + regression baseline. +- **`open_loop`** — schedules tx *i* at t₀ + i/λ **independent of sender + availability** (the coordinated-omission fix). λ comes from `tps>0` or the ramp + curve (`rampUp`). When concurrent in-flight sends would exceed `maxInFlight`, + the overdue tx is **dropped and counted** rather than throttling the clock — + reported at exit as `Open-loop dropped N txs`. Use this for any latency claim. + +Validation (`Settings.Validate`) rejects: +- an `arrivalModel` other than `open_loop`/`closed_loop`; +- `open_loop` with no finite positive rate (`tps<=0` **and** not `rampUp`) — λ + would be infinite, the inter-arrival gap collapses to 0, and the scheduler spins + and drops everything. + +### Seed (reproducibility) + +`seed` roots the deterministic PRNG sub-streams (keys, sizes, gas, accounts). Same +seed + same config reproduces the **draw multiset**, so the workload distribution +is statistically reproducible for fair A/B comparison. Caveats from the code: + +- Per-tx emission ordering is reproducible only at a single worker; above one + worker the multiset matches but ordering does not, and on-chain arrival order is + concurrent regardless. +- Omitting `seed` means "unseeded": the generator draws a random seed, writes it + back, and logs it for after-the-fact replay. + +## Scenarios + +`scenarios` is a weighted mix. Each entry creates one scenario instance; the +dispatcher selects among them by `weight` (relative, integer). The same `name` may +appear multiple times (instances are suffixed `_0`, `_1`, …). + +| Field | JSON key | Type | Meaning | +|-------|----------|------|---------| +| Name | `name` | string | Scenario kind (case-insensitive match). See list below. | +| Weight | `weight` | int | Relative selection weight within the mix. | +| Accounts | `accounts` | object | Optional per-scenario account pool; overrides the shared pool for this scenario. | +| GasPicker | `gasPicker` | object | Optional gas-limit picker (`fixed`/`random`). | +| GasFeeCapPicker | `gasFeeCapPicker` | object | Optional `maxFeePerGas` picker. | +| GasTipCapPicker | `gasTipCapPicker` | object | Optional `maxPriorityFeePerGas` picker. | +| KeyDistribution | `keyDistribution` | object | Keyspace index distribution (`uniform`/`zipfian`). ⚠️ **Requires PLT-465 (#54, unmerged) — parses but does not affect generated transactions on main.** See [gap](#schema-vs-implementation-gaps). | +| SizeDistribution | `sizeDistribution` | object | Payload-size distribution. ⚠️ **Same status as `keyDistribution`: requires PLT-465; parses but does not affect generated txs on main.** | + +### Scenario names + +Matched case-insensitively. Registered on main: + +`EVMTransfer`, `EVMTransferFast`, `EVMTransferNoop`, `ERC20`, `ERC20Noop`, +`ERC20Conflict`, `ERC721`, `Disperse`, `StorageRW`. + +An unknown name panics at scenario creation — validate with `--dry-run` first. + +### Gas pickers + +A picker is a tagged object discriminated by `Name`: + +```jsonc +"gasPicker": { "Name": "fixed", "Gas": 21000 } +"gasPicker": { "Name": "random", "Min": 21000, "Max": 100000 } // inclusive range +``` + +`random` requires `Min < Max`. With no picker, the scenario uses its built-in +defaults. Pickers are consumed by the EVMTransfer family (`GenerateGas`); the +field keys (`Name`, `Gas`, `Min`, `Max`) are capitalized on the wire. + +### Distributions + +A distribution is discriminated by `Name`: + +```jsonc +"keyDistribution": { "Name": "uniform" } +"keyDistribution": { "Name": "zipfian", "theta": 0.9 } // theta in [0, 1) +``` + +`zipfian.theta` must be in `[0, 1)`; `0` is uniform, larger hotspots low indices. +⚠️ These distributions (and the related `recordCount`, `sizeBuckets`, and +`operations` op-mix axes) **require PLT-465 (#54, unmerged as of writing) — on +main they parse but do not affect generated transactions.** See the +[implementation gap](#schema-vs-implementation-gaps) before relying on these for +workload skew. + +## Accounts + +```jsonc +"accounts": { + "count": 500, // pool size + "newAccountRate": 0.0 // fraction of txs that mint a fresh recipient account +} +``` + +| Field | JSON key | Type | Default | Effect | +|-------|----------|------|---------|--------| +| Accounts | `count` | int | `0` | Number of pre-generated accounts in the pool. | +| NewAccountRate | `newAccountRate` | float64 | `0.0` | Fraction of transactions that target a newly-minted account instead of a pool member. `0` = fixed pool. | + +A top-level `accounts` block is the **shared pool** for all scenarios; a +per-scenario `accounts` block creates a separate pool for that scenario. If +neither exists, scenario creation errors (`no accounts config defined`). + +**Funding interaction:** funding requires `newAccountRate == 0` everywhere (both +top-level and per-scenario). On-demand accounts are never funded, so their first +tx would fail for gas — `ValidateFunding` rejects the combo at load. + +## Funding + +Optional. When set (and not `--dry-run`), the account pool is funded from a root +key at startup so the run works against a real chain. + +| Field | JSON key | Type | Default | Meaning | +|-------|----------|------|---------|---------| +| RootKeyFile | `rootKeyFile` | string | `""` | Path to a file holding the root account's hex private key. **Preferred** — not exposed in the process environment. | +| RootKeyEnv | `rootKeyEnv` | string | `""` | Env var name holding the hex key. Fallback when `rootKeyFile` is unset. | +| FundAmountWei | `fundAmountWei` | string | `"1000000000000000000"` (1 SEI) | Per-account funding in wei. **Decimal STRING** (JSON numbers lose precision above 2^53). | +| BatchSize | `batchSize` | int | `200` | Recipients per `disperseEther` call. | + +`ValidateFunding` requires exactly one key source (`rootKeyFile` or `rootKeyEnv`) +and `newAccountRate == 0` across all account configs. + +## Gotchas + +- **`seiChainID` casing is cosmetic.** The struct tag is `seiChainID` (capital `ID`), + and several shipped profiles write `seiChainId` (lowercase `d`). Go's `encoding/json` + matches tags **case-insensitively**, so `seiChainId` binds to the same field — the + value is populated and the `chain_id` metric label and `chain_id`-keyed PromQL work + regardless of casing. Prefer `seiChainID` for style consistency only; it has **no + effect on binding or queries**. +- **Durations are strings.** `"10s"`, not `10`. A bare number fails to parse. +- **`fundAmountWei` is a string.** Quoting matters; an unquoted big number loses + precision or fails. +- **Trackers read `endpoints[0]` only.** Order endpoints so the first is stable. + +## Schema vs. implementation gaps + +Verified against main at doc time: + +- ⚠️ **`keyDistribution` / `sizeDistribution` / `sizeBuckets` / `recordCount` / + `operations` require PLT-465 (#54, unmerged as of writing) — on main these + fields parse but do not affect generated transactions.** They parse, validate, + and bind to deterministic RNG sub-streams in the generator, but **no scenario on + main calls `SampleIndex` on them** — the only `SampleIndex` call site is inside + `config/distribution.go` itself. Setting these fields today has no behavioral + effect on emitted transactions. PLT-465 (#54) is the pending PR that wires + scenario sampling; once it lands, revisit this note and the StorageRW axes in + [04-workload-model.md](04-workload-model.md). + +## See also + +- [01-mental-model.md](01-mental-model.md) — the pieces and how they connect. +- [02-running.md](02-running.md) — invoking the binary; CLI-flag equivalents. +- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts in depth. +- [06-measurement-metrics.md](06-measurement-metrics.md) — interpreting metrics and the run summary. +- [07-experiment-playbook.md](07-experiment-playbook.md) — reproducible experiment recipes. diff --git a/docs/04-workload-model.md b/docs/04-workload-model.md new file mode 100644 index 0000000..2933027 --- /dev/null +++ b/docs/04-workload-model.md @@ -0,0 +1,121 @@ +# Workload Model + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers: the scenario set sei-load can generate, what each one stresses, and the StorageRW contention/size/op-mix knobs that let an agent dial conflict and tx size. When an agent needs it: designing an experiment — choosing a scenario and configuring the axes that produce the load shape under test. + +> ⚠️ **Requires PLT-465 (#54, unmerged as of writing).** The StorageRW per-tx axes described below — `keyDistribution`, `sizeDistribution`, `sizeBuckets`, `recordCount`, and `operations` op-mix sampling — **parse but do not affect generated transactions on main**. They only shape txs once PLT-465 lands. On main today StorageRW emits a fixed scaffold (single slot 0, empty pad, all-`rmw`). Treat the axis sections as the PLT-465 interface, not current behavior. + +Each scenario is a `TxGenerator` resolved by lowercase `name` through the factory (`generator/scenarios/factory.go`). The `name` you put in a scenario config is one of the registered keys below. Unknown names panic at startup. + +## Scenario set + +Registered names (from `scenarioFactories`, `generator/scenarios/factory.go:13`): + +| `name` | Contract | Per-tx action | What it stresses | +|--------|----------|---------------|------------------| +| `evmtransfer` | none | Native value transfer, `value = now().Unix()`, gas 21000 | Baseline native-path throughput; signature recovery + balance update. Value varies per tx (non-zero). | +| `evmtransferfast` | none | Native transfer, fixed `value = 1e12`, **zero tip** | Same as above with constant value and no priority fee — cheapest native baseline. (Registered name is also `evmtransfer` via `Name()`; distinct factory key `evmtransferfast`.) | +| `evmtransfernoop` | none | Self-transfer, `value = 0`, gas 21000 | Native path with no balance delta — isolates execution overhead from state change. | +| `erc20` | ERC20 | `transfer(to, 1)`, gas 72156 | Real ERC20 SSTORE path: two balance-slot writes per transfer. Distinct sender/receiver slots → low cross-tx conflict. | +| `erc20noop` | ERC20Noop | `transfer(to, 1)`, gas 22460 | ERC20 ABI surface with a no-op body — measures dispatch/calldata cost without the storage writes. | +| `erc20conflict` | ERC20Conflict | `transfer(to, 1)`, gas 22460 | ERC20 variant engineered so transfers contend on shared state → drives the parallel executor's conflict path via an ERC20 shape. | +| `erc721` | ERC721 | `mint(to, id)`, monotonic `id` (atomic counter), gas 22460 | NFT mint path: fresh-slot SSTORE per tx plus a contended counter. | +| `disperse` | Disperse | `disperseEtherFixed(targets)` to 100 fresh accounts/tx | Fan-out: one tx touching 100 distinct recipient accounts. Account-creation heavy. | +| `storagerw` | StorageRWv1 | `read`/`write`/`rmw` against a caller-chosen slot, with calldata pad | **The tunable axis scenario.** SLOAD+SSTORE storage path with configurable key contention, tx size, and op mix. See below. | + +Gas limits above are the per-tx `GasLimit` each scenario pins (`CreateContractTransaction` / `CreateTransaction`). On a gas-limit-admission chain the limit — not gas used — reserves block space, so these are sized tight. A `gasPicker` in config overrides the native-transfer gas; contract scenarios pin their own limit. + +All contracts compile to the **`paris`** EVM target (solc 0.8.19, `Makefile:39`). `paris ⊂ Sei`'s active fork, so bytecode is unconditionally safe to deploy; runtime gas is set by the chain's live fork regardless of compile target (`Makefile:31-38`). + +## StorageRW: the two axes + +StorageRW (`generator/scenarios/StorageRW.go`) is the scenario built for parametric conflict/size experiments. Per tx it makes three **independent** draws — slot (key contention), pad length (tx size), and operation (op mix) — each on its own seeded RNG sub-stream, then builds a `read`/`write`/`rmw` call against `StorageRWv1`. + +> ⚠️ **Requires PLT-465 (#54, unmerged as of writing) — on main these fields parse but do not affect generated transactions.** The per-tx slot, op, and pad axes are delivered by PLT-465 (not yet on main). On main, StorageRW is a scaffold: every tx is a fixed-slot-0, empty-pad `rmw` (`generator/scenarios/doc.go` "StorageRW scaffold"). What follows is the PLT-465 interface. + +The contract `StorageRWv1` (`generator/contracts/StorageRWv1.sol`) is mapping-backed (`mapping(uint256 => uint256) store`) with no fixed keyspace — the slot index is caller-chosen, so the keyspace resizes with config and never needs a redeploy. `read` folds the load into `readAccumulator` so the SLOAD is non-elidable; `rmw` does `store[slot] += 1`; `write` sets `store[slot] = 1`. All use `unchecked` arithmetic so no tx ever reverts on overflow. + +**Defaults (nil-guarded, the 100%-conflict baseline):** with no `keyDistribution`/`recordCount`, every tx hits fixed slot 0 (`pickSlot` — PLT-465 branch, not on main). With no `sizeDistribution`/`sizeBuckets`, the pad is empty (`pickPad` — PLT-465 branch, not on main). With no `operations`, every tx is `rmw` (`pickOp` — PLT-465 branch, not on main). So bare `{"name":"storagerw"}` = single-slot, empty-pad, all-rmw = maximum contention. (On main this is the *only* behavior — see the banner; the scaffold is unconditionally fixed-slot-0, empty-pad, `rmw`.) + +### Axis 1 — KEY CONTENTION + +The slot each tx touches is `keyDistribution.SampleIndex(recordCount)` — a draw in `[0, recordCount)` (PLT-465 branch, not on main). Contention is the probability that two txs in the same block draw the same slot. + +- **Keyspace size** = `recordCount`. Larger → lower collision probability at fixed distribution. +- **Distribution** = `keyDistribution`: `uniform` (flat) or `zipfian` with `theta` in `[0, 1)`. + - `theta → 0`: approaches uniform. Over a large keyspace, collision ≈ 0% (`config/doc.go:28-36`). + - `theta → 1`: draws concentrate on low indices → a hotspot. `theta` is validated to `[0, 1)`; `alpha = 1/(1-theta)` diverges at 1 (`distribution.go:163`). + - `recordCount = 0` (or no `keyDistribution`): single slot 0 = **100% conflict**. + +To set X contention, configure: + +```jsonc +// ~0% conflict: uniform over a large keyspace +{ "name": "storagerw", + "keyDistribution": {"Name": "uniform"}, + "recordCount": 1000000 } + +// moderate hotspot: zipfian, low indices favored +{ "name": "storagerw", + "keyDistribution": {"Name": "zipfian", "theta": 0.9}, + "recordCount": 1000000 } + +// 100% conflict: single slot (omit key config) +{ "name": "storagerw" } +``` + +Verified on the PLT-465 branch: `TestStorageRWContentionSweep` (not on main) pins both ends — uniform over 1e6 with 2000 draws is >99% distinct slots; default config is always slot 0. + +Note `recordCount` is the keyspace the distribution **indexes into**, not a count of distinct slots that will be touched in a run. Actual collision in a single block is a function of `recordCount`, distribution shape, and how many StorageRW txs land in that block (i.e. your rate ÷ block production). + +### Axis 2 — TX SIZE + +Each tx carries a zero-filled calldata pad whose length is `sizeBuckets[sizeDistribution.SampleIndex(len(sizeBuckets))]` (`pickPad` — PLT-465 branch, not on main). The pad is an ignored `bytes _pad` argument on every method — it varies tx size without touching the storage logic. + +- `sizeBuckets`: the histogram of candidate pad lengths in bytes, e.g. `[0, 64, 256, 1024]`. Each entry capped at 1 MiB (`config.go`). +- `sizeDistribution`: `uniform` or `zipfian`, selects which bucket index per tx. +- **Gas:** the pad's intrinsic cost is `4 gas per zero byte` (the base calldata gas schedule for zero bytes — this rate predates and is unchanged by EIP-2028, which only lowered the *non-zero* byte cost from 68→16) added on top of the 50k base: `GasLimit = 50000 + len(pad)*4` (PLT-465 branch, not on main). A larger pad → larger tx → more calldata gas, scaling block-space consumption per tx. + +```jsonc +{ "name": "storagerw", + "keyDistribution": {"Name": "uniform"}, "recordCount": 1000000, + "sizeDistribution": {"Name": "uniform"}, + "sizeBuckets": [0, 64, 256, 1024] } +``` + +**Independence (load-bearing):** the size draw rides sub-stream `dist:%d:size`, distinct from the key sub-stream `dist:%d:key` (`utils/rng/streams.go` — both stream IDs are frozen and present on main). Changing the size config never perturbs the key sequence — verified on the PLT-465 branch by `TestStorageRWKeySizeIndependence` (not on main): same seed + same key config yields an identical slot sequence with and without a size distribution. This lets an agent sweep one axis while holding the other's draw multiset fixed. + +### Axis 3 — OP MIX + +`operations` weights the read/write/rmw selection (`config/operation.go` — PLT-465 branch, not on main). Weights are relative; a per-tx draw picks in proportion to weight over total. Nil or all-zero → all `rmw` (the default, since `OpRmw` is the zero value). + +```jsonc +{ "name": "storagerw", + "operations": {"read": 1, "write": 1, "rmw": 2} } +``` + +What each op does to conflict: `read` is an SLOAD (folded into `readAccumulator`); `write` and `rmw` are SSTOREs. Two reads of the same slot do **not** conflict under OCC (no write); a read+write or write+write on the same slot **does**. So op mix and key contention compose: a high-`theta` keyspace with all-`read` exhibits far less executor conflict than the same keyspace with all-`rmw`. The op draw rides its own sub-stream `dist:%d:op` — **a PLT-465-future stream ID, NOT one of the streams frozen on main** (main's frozen set is the 8 IDs in [05-reproducibility §Stream IDs that exist](05-reproducibility.md#stream-ids-that-exist); `dist:%d:op` is added only by PLT-465). Verified independent of the key sequence on the PLT-465 branch by `TestStorageRWOpIndependence` (not on main). + +## What these axes actually probe on Sei + +> This section is domain reasoning about Sei's execution model layered on top of what the code generates. Where a claim is about sei-load code it is cited; where it is about Sei node behavior it is flagged as REASONED — confidence noted. Validate node-side claims against the SUT's own metrics. + +**Sei is a parallel-EVM chain with optimistic concurrency control (Block-STM-style).** Transactions in a block are executed speculatively in parallel; a read-set/write-set validation pass detects when one tx read a slot another tx wrote, and re-executes the loser serially. (REASONED — this is the documented Sei/Block-STM design; confidence: high on the mechanism class, medium on exact scheduler details which vary by sei-chain version.) + +**Key contention exercises the conflict/abort-and-re-execute path.** When many txs in one block draw the same `store[slot]` and at least one writes it, the optimistic schedule's validation fails for the conflicting txs and they re-execute. As contention rises (smaller `recordCount`, higher `theta`, or single-slot default), the hot slot's throughput degrades toward **serial** as the conflicting write-set fraction → 1 (for that hot slot) — the parallel executor cannot retire conflicting writers concurrently. Throughput for the hot slot is bounded by serialized re-execution, not by parallel width. (REASONED; confidence: high — this is the defining behavior of OCC under write contention.) + +**Contrast with a DynamoDB-style hot shard — different mechanism, same observable.** A DynamoDB hot partition degrades because the partition has a fixed WCU/RCU budget and excess requests are **throttled** (a storage-capacity/rate limit). Sei has **no per-key throughput cap**. The limit on a hot slot is *execution-conflict serialization*: the slot can be written as fast as the executor can run the conflicting txs back-to-back, but those txs cannot run *in parallel*. Same surface symptom (hot key → throughput plateaus), fundamentally different cause (OCC re-execution vs. provisioned-capacity throttling). An agent must not interpret a StorageRW hot-slot plateau as a storage-rate limit — there is no quota to raise; the cure is reducing conflict (spread the keyspace) or accepting serial throughput for that slot. (REASONED; confidence: high.) + +**Node-side signal to watch:** Block-STM conflict / abort / re-execution rate. On Sei this surfaces (when exposed by the SUT) as `sei_occ_*` metrics. (REASONED — the metric family name is the expected Sei convention; confidence: medium. Confirm the exact series exposed by the node version under test before relying on them; the SUT may not export them at all.) The generator-side signal is unambiguous: you control conflict probability via `recordCount` + `theta` + op mix, and those draws are deterministic for a given seed. + +**Gas-model interplay.** The calldata pad (Axis 2) adds `4 gas per zero byte` (the base calldata gas schedule; PLT-465 branch, not on main), so larger txs consume proportionally more block gas and admit fewer txs/block on a gas-limit-admission chain. Size and contention are orthogonal stressors: size limits *how many* txs fit a block; contention limits *how many of those can execute in parallel*. Sweeping both maps the throughput surface. (Code-grounded for the gas formula; the admission behavior is REASONED, confidence: high — consistent with the package doc's "gas-limit-admission" rationale, `generator/scenarios/doc.go`.) + +**EVM version.** Contracts target `paris` (solc 0.8.19), a strict subset of Sei's active fork — safe on Sei, and compile target does not distort runtime gas (`Makefile:31-39`). VERIFIED. + +## See also + +- [03-config-reference](03-config-reference.md) — full Scenario/Distribution JSON schema. +- [06-measurement-metrics](06-measurement-metrics.md) — the counters to read when interpreting a run. +- [07-experiment-playbook](07-experiment-playbook.md) — putting axes together into a sweep. +- [08-limits-boundaries](08-limits-boundaries.md) — measurement boundaries that bound how to read results. diff --git a/docs/05-reproducibility.md b/docs/05-reproducibility.md new file mode 100644 index 0000000..658b2d1 --- /dev/null +++ b/docs/05-reproducibility.md @@ -0,0 +1,158 @@ +# Reproducibility + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers / when an agent needs it. How to get reproducible workloads for +> fair A/B exploration: the seed → sub-stream derivation, the exact (and honest) +> determinism guarantee, how to set up an A/B run, and the open-loop property that +> keeps the admitted workload stable under SUT-driven drops. Read this before you +> compare two runs and attribute a difference to the change you made. + +## The determinism guarantee — read precisely + +sei-load gives you **per-stream draw multiset reproducibility**: + +> Same seed + same config ⇒ identical per-stream draw multiset. + +That is, the *distribution* of keys, sizes, gas values, and accounts is +statistically reproducible — which is exactly what fair A/B comparison requires. + +**What is NOT guaranteed:** + +- **Ordered, byte-identical replay above 1 worker.** With more than one worker, + workers interleave their draws into the shared streams non-deterministically, so + the ordered per-tx sequence differs run to run at the same seed (the multiset + still matches). +- **On-chain arrival order** is concurrent regardless of worker count, so it is + never reproducible. + +**Ordered replay holds only at a single worker** (`workers: 1`, +`TasksPerEndpoint: 1`). If you need byte-for-byte deterministic emission ordering, +run with one worker. Otherwise, design your analysis around the multiset, not the +sequence. (Contract source: `utils/rng/rng.go` package doc.) + +## Seed → sub-stream derivation + +A run is rooted at one `seed`. Each logical consumer draws from its own +independent sub-stream, derived by the **FROZEN** formula: + +``` +substream(seed, streamID) = NewPCG(seed, splitmix64(fnv1a64(streamID))) +``` + +- `fnv1a64(streamID)` hashes the consumer name to a uint64. +- `splitmix64` diffuses it so near-identical names (e.g. `gas:0:base` / + `gas:1:base`) seed well-separated PCG states. +- The result seeds a `math/rand/v2.PCG`. + +**Worker-count independence.** Sub-streams are keyed by a *logical* stream id (a +string naming the consumer/purpose), never by a live-goroutine counter. So the +per-stream draw multiset a seed yields is invariant to `--workers`: adding workers +does not shift any stream's sequence. + +### The FROZEN one-way-door contract + +Changing the derivation breaks replay of every previously saved run. **Four +inputs are frozen** (`utils/rng/rng.go`), each a one-way door requiring a +`config_sha256` version bump: + +1. The derivation formula (hash, diffusion, PCG argument order). +2. The set of stream-id strings (`utils/rng/streams.go`). The streamID feeds + `fnv1a64`, so renaming any id reseeds that stream. Additions are append-only + and do not perturb existing streams. +3. The per-stream draw order (e.g. drawing base before tip before feecap). +4. The per-tx account draw cadence: `sender` then `receiver` `NextAccount()` per + tx (`generator/scenario.go`), each consuming the account stream. + +Replay archives are keyed by `config_sha256`. If you (or a tool) change any frozen +input, do not expect old saved runs to replay — they will silently produce a +different draw sequence for the same `(seed, config)`. + +### Stream IDs that exist + +Defined in `utils/rng/streams.go`. `%d` is the scenario's config index `i`: + +| Stream ID | Consumer | +|---|---| +| `accounts:shared` | shared (top-level) account pool (`StreamAccountsShared`) | +| `accounts:scenario:%d` | scenario `i`'s own account pool (`AccountsScenarioStream`) | +| `weighted:shuffle` | the weighted scenario selector's shuffle (`StreamWeightedShuffle`) | +| `gas:%d:base` | scenario `i`'s base-gas picker (`GasBaseStream`) | +| `gas:%d:tip` | scenario `i`'s tip-cap picker (`GasTipStream`) | +| `gas:%d:feecap` | scenario `i`'s fee-cap picker (`GasFeeCapStream`) | +| `dist:%d:key` | scenario `i`'s key-distribution index sampler (`KeyDistributionStream`) | +| `dist:%d:size` | scenario `i`'s size-distribution index sampler (`SizeDistributionStream`) | + +## Setting the seed + +The seed lives in the **config file**, not on the CLI. Set the top-level `seed` +field (`config.LoadConfig.Seed`, a `*uint64`): + +```json +{ + "chainId": 1329, + "endpoints": ["http://localhost:8545"], + "seed": 42, + "scenarios": [ /* ... */ ], + "settings": { /* ... */ } +} +``` + +**Unset seed is randomized and recorded.** With no `seed`, the generator resolves +a cryptographically-random one, writes it back into the config, and logs it: + +``` +🎲 No seed configured; generated random seed 12345678901234567890 (set "seed" to replay) +``` + +To replay that run after the fact, copy the logged seed into the `seed` field and +re-run with the same config. (Source: `generator.resolveSeed`, +`rng.NewRandomSource`.) Note: the resolved seed is surfaced via the log line and +written back into the in-memory config — it is **not** a field on the emitted +`stats.RunSummary`, so capture it from the log if you need it. + +## Running a reproducible A/B + +1. **Pin the seed.** Set `seed` to a fixed value in both arms. +2. **Hold config constant** across the two arms — same scenarios, weights, + distributions, account config, endpoints set. +3. **Vary exactly one axis** (the thing under test): e.g. `tps`, `maxInFlight`, + `arrivalModel`, or a SUT-side change. +4. Compare the externally-computed metrics (this tool emits signal, not verdicts — + see [01-mental-model.md](01-mental-model.md#measurement-philosophy)). + +Because the workload is a fixed multiset at a fixed seed, a difference between +arms is attributable to the one axis you varied (plus concurrency noise above 1 +worker — keep that in mind for tight comparisons; drop to `workers: 1` if you need +ordered determinism). + +Changing scenarios, weights, distribution parameters (e.g. `theta`), account +config, or any frozen input changes the workload itself — that is no longer a fair +A/B of one axis. + +## Open-loop determinism under drops + +A critical property for stress experiments: in open-loop, **admitted txs are a +deterministic prefix of the seeded sequence**, because a dropped tick draws no tx +(the permit is acquired *before* `Generate()` — see +[01-mental-model.md](01-mental-model.md#open-loop-the-fix)). + +Consequence: **the same seed yields the same admitted multiset regardless of how +many ticks SUT slowness forced to drop.** A faster SUT (fewer drops) and a slower +SUT (more drops) admit different *counts*, but the slower run's admitted set is a +prefix of the faster run's — the per-stream reproducibility contract holds under +saturation, where a draw-on-drop scheme would have broken it. `SequenceIndex` is +the arrival-tick index `i`: monotonic but non-contiguous across admitted txs under +drops (dropped ticks advance `i` and the clock while consuming no draw). + +In closed-loop there is no such admission gate; the SUT speed governs how many +txs are generated, so the comparison anchor is weaker. + +## See also + +- [01-mental-model.md](01-mental-model.md) — pipeline, arrival models, glossary. +- [02-running.md](02-running.md) — invoking a run. +- [03-config-reference.md](03-config-reference.md) — every config/CLI setting. +- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts. +- [06-measurement-metrics.md](06-measurement-metrics.md) — emitted metrics and the run summary. +- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for common experiments. diff --git a/docs/06-measurement-metrics.md b/docs/06-measurement-metrics.md new file mode 100644 index 0000000..23480bf --- /dev/null +++ b/docs/06-measurement-metrics.md @@ -0,0 +1,286 @@ +# 06 — Measurement & Metrics + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers / when an agent needs it: the exact signals sei-load emits, what +> they mean, and the conservation model that ties them together. Read this before +> writing any query against a run, before computing a rate/percentile/verdict, and +> before trusting a number. **The tool emits raw signals only; every rate, +> percentile, and pass/fail verdict is computed by you, the agent, via queries.** + +--- + +## 1. The conservation model + +sei-load supports an **open-loop** arrival model, but it is **opt-in**: +**closed-loop is the default** (see [01-mental-model](01-mental-model.md)), and open-loop +is selected with `--arrival-model open_loop` (see [04-workload-model](04-workload-model.md) +for arrival mechanics). The inclusion identities and inclusion-latency series below are +valid **only for open-loop runs**; in closed-loop `IntendedSendTime` is enqueue time, so +the latency sample is omitted (counts still tracked). Every transaction the scheduler +creates flows through two accounting stages whose terms must balance. These identities are +the foundation of run validity: if the terms don't add up, the run is suspect. + +### Stage 1 — Send accounting (always tracked) + +``` +scheduled == dropped + admitted +admitted == succeeded + failed +``` + +| Term | Meaning | +|------|---------| +| `scheduled` | Every arrival tick the open-loop scheduler reaches at instant `t₀ + i/λ`. Not directly emitted; it is the sum of the right-hand side. | +| `dropped` | Ticks shed because true in-flight was saturated (`maxInFlight` reached) at the scheduled instant. **Genuine load shed**, not buffer geometry. A dropped tick draws no generator and signs no tx. → `seiload_run_txs_dropped_total`. | +| `admitted` | Ticks that acquired an in-flight permit and were generated + signed + enqueued. | +| `succeeded` | Admitted txs whose synchronous RPC send returned nil error (accepted by the endpoint). → `seiload_txs_accepted_total`. | +| `failed` | Admitted txs whose send returned an error. Counted, never lost. → `seiload_run_txs_failed_total` (and per-error `seiload_txs_rejected_total`). | + +**Shutdown boundary:** `admitted == succeeded + failed` holds exactly only on a clean +drain (generator exhaustion). On `ctx` cancel (SIGTERM / `--duration` expiry), some +admitted txs may still be buffered for a worker and exit uncounted — bounded by channel +backlog. For latency/goodput claims prefer runs that drain cleanly or are long enough +that the boundary undercount is negligible. + +### Stage 2 — Inclusion accounting (only with `--track-receipts`) + +``` +registered == included + expired + inflight_at_shutdown +registered ⊆ succeeded (only successful sends are registered) +``` + +> Note: `dropped_at_cap` is **not** a term in this identity — it is excluded. +> `registered = succeeded − dropped_at_cap` (sends rejected at the registry cap were +> never registered, so they appear in neither side of the conservation balance). + +| Term | Meaning | +|------|---------| +| `registered` | Successful sends handed to the inclusion tracker. **Not its own series** — by design the denominator for inclusion rate is `succeeded` (`seiload_txs_accepted_total`), never a minted `registered` series. | +| `included` | Txs observed on-chain (matched in an arriving block, `InclusionTime` stamped). → the `_count` of `seiload_inclusion_latency_seconds` **in open-loop only**; otherwise read from the run-summary log line / `seiload_inclusion_outcome_total` is *not* it. See §3.1. | +| `expired` | Registered txs reaped un-included after `reapAfter` (default 30s, `--inclusion-reap-after`). → `seiload_inclusion_outcome_total{outcome="expired"}`. | +| `dropped_at_cap` | Successful sends rejected at the inclusion-registry cap (registry full). **Excluded from the inclusion denominator** — they were never registered. → `seiload_inclusion_outcome_total{outcome="dropped_at_cap"}`. | +| `inflight_at_shutdown` | Registry size at run end, read after workers + tracker join. → `seiload_run_inflight_at_shutdown`. | + +**Conservative degradation (undercounts only, never miscounts):** WS head gaps +(`seiload_block_gaps_total`), block-body fetch failures (`seiload_block_fetch_errors_total`), +and late registrations all cause affected txs to reap as `expired` rather than be +miscounted as included. A nonzero `seiload_block_gaps_total` or +`seiload_block_fetch_errors_total` means your `included` is an **under**count — factor that +into inclusion-rate claims. + +--- + +## 2. The emitted-metric catalog + +All instruments are OTel, exported on the Prometheus `/metrics` endpoint +(`--metricsListenAddr`, default `0.0.0.0:9090`; OpenMetrics enabled so exemplars +survive). + +> **Wire names differ from the instrument base names.** The Prometheus exporter is +> configured with `WithNamespace("seiload")` (configurable; `observability/setup.go`), +> so every exported series is prefixed **`seiload_`**. OTel also appends **unit +> suffixes** on scrape: a `s`-unit histogram becomes `…_seconds`, etc. Combined with +> Prometheus's own suffixing — histograms expose `_bucket`/`_sum`/`_count`, counters end +> `_total` — the wire name can differ substantially from the base name an instrument is +> declared with. The catalog and every PromQL below use the **real wire names**. (The +> `{gas}`, `{height}`, `{transactions}`, `{count}` "annotation" units are dropped, not +> suffixed; only real units like `s` / `/s` produce a suffix.) + +### 2.1 Block & gas signals (require `--track-blocks`) + +Emitted by the block collector from new-head subscriptions (`stats/block_collector.go`). +`seiload_block_time_seconds` is **header-arrival-to-arrival wall clock**, not `header.Time`. + +| Metric | Type | Unit | Attributes | Meaning | +|--------|------|------|-----------|---------| +| `seiload_gas_used` | histogram | `{gas}` (dropped) | `chain_id` | Gas used per block (`_bucket`/`_sum`/`_count`). Buckets: 1, 1k, 10k, 50k, 100k, 200k, 300k, 400k, 500k, 600k, 700k, 800k, 1M. | +| `seiload_block_time_seconds` | histogram | `s` | `chain_id` | Wall-clock interval between observed block headers (`_bucket`/`_sum`/`_count`). Buckets: 0.1…1.0 (0.1 step), 2, 5, 10, 20. | +| `seiload_block_number` | gauge | `{height}` (dropped) | `chain_id` | Highest block height observed (monotonic). | + +### 2.2 Send-path signals (always on) + +Emitted from the worker send loop (`sender/worker.go`, `sender/metrics.go`). + +| Metric | Type | Unit | Attributes | Meaning | +|--------|------|------|-----------|---------| +| `seiload_send_latency_seconds` | histogram | `s` | `scenario`, `endpoint`, `chain_id`, `status` (`success`/`failure`) | RPC send round-trip latency (`_bucket`/`_sum`/`_count`). **NOT inclusion latency** — this is enqueue→RPC-return, the SUT-admission cost, not time-to-chain. Buckets: 0.1, 0.2, 0.3, 0.5, 1, 2, 3, 5, 10, 20. Carries trace exemplars. | +| `seiload_txs_accepted_total` | counter | `{transactions}` (dropped) | `endpoint`, `scenario` | Sends accepted by the endpoint (`succeeded`). The inclusion-rate denominator. | +| `seiload_txs_rejected_total` | counter | `{transactions}` (dropped) | `endpoint`, `scenario`, `reason` (currently only `rpc`) | Sends the target/client rejected (`failed`). | +| `seiload_worker_queue_length` | observable gauge | `{count}` (dropped) | `endpoint`, `worker_id`, `chain_id` | Current depth of a worker's send channel. Saturation/backpressure signal. | +| `seiload_tps_achieved_per_second` | observable gauge | `{transactions}/s` | `endpoint`, `chain_id`, `scenario` | Most recent sender-sampled TPS per endpoint/scenario. | + +### 2.3 Inclusion signals (require `--track-receipts`, not under `--dry-run`) + +Emitted by `stats.InclusionTracker` (`stats/inclusion_tracker.go`). + +| Metric | Type | Unit | Attributes | Meaning | +|--------|------|------|-----------|---------| +| `seiload_inclusion_latency_seconds` | histogram | `s` | `chain_id` | `InclusionTime − IntendedSendTime` (`_bucket`/`_sum`/`_count`). **Open-loop only** (in closed-loop `IntendedSendTime` is enqueue time, so the sample is omitted; counts still tracked). Its `_count` is the open-loop `included` total. Buckets: 0.5, 1, 2, 5, 10, 30, 60, 120. | +| `seiload_inclusion_outcome_total` | counter | `{transactions}` (dropped) | `chain_id`, `outcome` (`expired` \| `dropped_at_cap`) | In-flight txs that left the registry un-included. | +| `seiload_block_gaps_total` | counter | `{blocks}` (dropped) | `chain_id` | Missed head heights (no backfill). Nonzero ⇒ `included` is an undercount. | +| `seiload_block_fetch_errors_total` | counter | `{blocks}` (dropped) | `chain_id` | Block-body fetches that failed (no retry); those txs reap as `expired`. Nonzero ⇒ `included` undercount. | +| `seiload_inclusion_inflight` | observable gauge | `{transactions}` (dropped) | `chain_id` | Live size of the in-flight inclusion registry. | + +### 2.4 Run-summary gauges (emitted once at run end) + +Recorded by `Collector.EmitRunSummary` (`stats/run_summary.go`) at shutdown, then held +for `--post-summary-flush-delay` (default 25s) so the final scrape catches them. One +series per run via the OTel Resource (run-scope) join. + +| Metric | Type | Unit | Attributes | Meaning | +|--------|------|------|-----------|---------| +| `seiload_run_tps_final_per_second` | gauge | `{transactions}/s` | — | Peak observed overall TPS (10s sliding-window max) for the run. | +| `seiload_run_duration_seconds` | gauge | `s` | — | Wall-clock run duration. | +| `seiload_run_txs_accepted_total` | gauge | `{transactions}` (dropped) | — | Total txs accepted by endpoints over the run (collector's `totalTxs`). Gauge already named `…_total`; no extra suffix. | +| `seiload_run_txs_dropped_total` | gauge | `{transactions}` (dropped) | `arrival_model` | Open-loop txs `dropped` on in-flight saturation. | +| `seiload_run_txs_failed_total` | gauge | `{transactions}` (dropped) | `arrival_model` | Admitted txs whose send `failed`. | +| `seiload_run_inflight_at_shutdown` | gauge | `{transactions}` (dropped) | — | Inclusion registry size at end (only emitted when `--track-receipts`). | + +> Note: `seiload_run_tps_final_per_second` is a **peak** (sliding-window max), not a +> mean. For a mean, compute `seiload_run_txs_accepted_total / seiload_run_duration_seconds`. + +--- + +## 3. Verdicts are external — compute them yourself + +The tool deliberately emits **counts and histograms**, not rates/percentiles/verdicts. +You derive those. Concrete recipes follow. + +Run-scope identity rides on the OTel **Resource**, not on per-sample labels, so it reaches +PromQL via a join target (e.g. `target_info` / `seiload_target_info`) rather than as a label +on each series. The run-scope join keys that *can* exist are +`seiload_run_id`, `seiload_chain_id`, `seiload_commit_id`, `seiload_workload`, +`service_instance_id`, and `service_version` (`observability/setup.go`). Each is +**conditional on its `SEILOAD_*` env var being set** (`service_instance_id` falls back to +hostname; the rest are omitted when empty) — adjust selectors to your environment. See +[../observability/README.md](../observability/README.md) for the cardinality rationale and +how the Resource is exported. + +### 3.1 Inclusion rate + +`included / succeeded`. In **open-loop**, `included` is the `seiload_inclusion_latency_seconds` +histogram count: + +```promql +# open-loop inclusion rate over the run +sum(seiload_inclusion_latency_seconds_count) / sum(seiload_txs_accepted_total) +``` + +In **closed-loop**, `seiload_inclusion_latency_seconds` is not recorded — read `included` from the +run-summary log line (`📦 Inclusion: included=…`) or compute the complement from outcomes: +`included = registered − expired − dropped_at_cap − inflight_at_shutdown`, where +`registered = succeeded − dropped_at_cap`. For rate claims that need a histogram count, +**use open-loop** (§ [05-reproducibility](05-reproducibility.md)). + +Subtract the un-included tail explicitly when you need the loss breakdown: + +```promql +sum(seiload_inclusion_outcome_total{outcome="expired"}) # timed out un-included +sum(seiload_inclusion_outcome_total{outcome="dropped_at_cap"}) # registry full (denominator excludes these) +``` + +### 3.2 Latency percentiles (tail) + +Use `histogram_quantile` over the open-loop inclusion histogram for **time-to-chain**: + +```promql +# p99 inclusion latency, open-loop only +histogram_quantile(0.99, sum by (le) (rate(seiload_inclusion_latency_seconds_bucket[1m]))) +``` + +For **admission latency** (send round-trip, any model): + +```promql +histogram_quantile(0.99, sum by (le) (rate(seiload_send_latency_seconds_bucket[1m]))) +``` + +Do **not** quote `seiload_inclusion_latency_seconds` percentiles from a closed-loop run — the histogram +is empty there, and even where it exists closed-loop suffers coordinated omission +(see [04-workload-model](04-workload-model.md)). + +### 3.3 Goodput (committed / offered) + +Goodput = on-chain commitments per second relative to what was offered: + +```promql +# committed throughput (TPS) +sum(seiload_inclusion_latency_seconds_count) / scalar(seiload_run_duration_seconds) + +# goodput ratio: committed / offered +sum(seiload_inclusion_latency_seconds_count) / sum(seiload_run_txs_accepted_total) +``` + +Drop and failure fractions of offered load: + +```promql +sum(seiload_run_txs_dropped_total) / (sum(seiload_run_txs_accepted_total) + sum(seiload_run_txs_dropped_total)) +sum(seiload_run_txs_failed_total) / sum(seiload_run_txs_accepted_total) +``` + +### 3.4 Detecting a generator-bound (invalid) run — `schedule_lag` + +A run is only a valid load measurement if the generator **kept up with its own +schedule**. The canonical gate is `schedule_lag = AttemptedSendTime − IntendedSendTime` +(sends falling behind the arrival schedule even before any tx is shed). + +> **`schedule_lag` is a concept, NOT an emitted metric on main today** (the emitter +> was punted as PLT-463). You cannot query it. Compute run validity externally from +> the signals that *are* emitted: +> +> - **High `seiload_run_txs_dropped_total` with low SUT utilization** ⇒ suspect the +> generator (or `maxInFlight`) shed load before the SUT was saturated. Drops should +> track SUT saturation, not generator stalls. +> - **`seiload_run_tps_final_per_second` ≪ configured `--tps`** ⇒ the generator never +> reached target rate; the run under-loaded the SUT and latency/throughput numbers are +> not at the intended λ. +> - **Rising `seiload_worker_queue_length`** ⇒ workers are backing up; admission is the bottleneck. +> +> If you need a hard generator-validity gate, file an `/issue` requesting a +> `schedule_lag` histogram (the query you want: `histogram_quantile(0.99, +> schedule_lag_bucket)` to assert p99 lag < one inter-arrival gap). Until then, treat the +> heuristics above as the validity check and state the assumption in your report. + +--- + +## 4. Reading the run-summary / final stats output + +Two surfaces report end-of-run state. **Both** are worth capturing. + +### 4.1 Run-summary gauges + log lines (authoritative for conservation) + +At shutdown sei-load logs the conservation tallies and records the §2.4 gauges: + +``` +⚠️ Open-loop dropped N txs (in-flight saturated; not throttled) +⚠️ Open-loop N txs failed to send (admitted but errored; not lost) +📦 Inclusion: included=… expired=… dropped_at_cap=… inflight_at_shutdown=… +``` + +This log line is the ground truth for the Stage-2 identity; cross-check it against your +`inclusion_*` queries. The gauges persist on `/metrics` for `--post-summary-flush-delay` +so a final scrape captures them — ensure your scrape interval is shorter than that delay. + +### 4.2 `--report-path` file / stdout final stats + +`Logger.LogFinalStats` (`stats/logger.go`) prints — and, with `--report-path`, writes — +a **formatted text report** (not JSON, despite the JSON-tagged `FinalStats` struct). +Schema-versioned run-summary JSON is future work (PLT-467); do not write a parser +expecting JSON from `--report-path` today. +It contains: runtime, total txs, avg/max TPS, per-endpoint P50/P99 (in-process +percentiles over a 10k-sample ring buffer — coarse, not the histogram), per-scenario +distribution, and block-time/gas P50/P99/max. + +> Caveat: the report's per-endpoint P50/P99 are computed in-process over a bounded +> latency ring (`maxLatencyHistory = 10000`) and are **send latency**, not inclusion +> latency. For trustworthy tail-latency claims use the `seiload_inclusion_latency_seconds` / +> `seiload_send_latency_seconds` histograms via `histogram_quantile` (§3.2), not the report file. + +--- + +## See also + +- [01-mental-model](01-mental-model.md) — what sei-load is and isn't. +- [04-workload-model](04-workload-model.md) — open-loop arrival, λ, drops, coordinated omission. +- [05-reproducibility](05-reproducibility.md) — fixed seed, open vs closed loop, fair A/B. +- [07-experiment-playbook](07-experiment-playbook.md) — objective → knobs → interpretation. +- [08-limits-boundaries](08-limits-boundaries.md) — what to rule out before trusting a result. diff --git a/docs/07-experiment-playbook.md b/docs/07-experiment-playbook.md new file mode 100644 index 0000000..c6d76ef --- /dev/null +++ b/docs/07-experiment-playbook.md @@ -0,0 +1,177 @@ +# 07 — Experiment Playbook + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers / when an agent needs it: the reasoning layer on top of the metrics. +> Given an objective, which knobs to turn, which signals to read, and what they mean for +> your next move. Read this when you are about to *design* a run, not just interpret one. +> Metric names and PromQL referenced here are defined in +> [06-measurement-metrics](06-measurement-metrics.md) — do not re-derive them. + +--- + +## 0. The autonomous run loop + +Every experiment is one turn of this loop. Run it deliberately; don't fire-and-forget. + +``` +1. OBJECTIVE → what question am I answering? (capacity? tail latency? contention?) +2. KNOBS → set exactly the variables under test; FREEZE everything else (seed!). +3. VALIDITY → is this run a fair measurement? (§5 — check BEFORE trusting numbers) +4. READ → pull the specific signals the objective needs. +5. INTERPRET → what do they mean? does conservation balance? +6. NEXT MOVE → adjust one knob, or conclude. Record the seed + config for A/B. +``` + +**Cardinal rule for comparability:** change **one** independent variable per run, hold a +**fixed seed** (set the top-level `seed` in the config file — there is **no `--seed` CLI +flag**; see [05-reproducibility](05-reproducibility.md)), and use **open-loop** +(`--arrival-model open_loop`; note closed-loop is the *default*) for any latency or +capacity claim. + +> ⚠️ **StorageRW axes require PLT-465 (#54, unmerged as of writing).** Recipes 2 and 3 +> below sweep `keyDistribution`/`zipfian-θ`/`recordCount`/`sizeDistribution`/`sizeBuckets`/ +> `operations`. **On main these fields parse but do not affect generated transactions** — +> StorageRW emits a fixed scaffold (slot 0, empty pad, all-`rmw`). Treat the contention and +> size sweeps as runnable only once PLT-465 lands; see +> [04-workload-model](04-workload-model.md). + +--- + +## 1. Decision framework — objective → knobs + +| Objective | Primary knob(s) | Hold fixed | Read | +|-----------|-----------------|-----------|------| +| Key/state contention | scenario `StorageRW`, zipfian-θ over `recordCount` **(PLT-465 — no effect on main)** | seed, λ, tx mix, endpoints | `seiload_tps_achieved_per_second`, `seiload_inclusion_latency_seconds` p99, SUT block-stm abort rate (external) | +| Tx-size scaling | size-distribution / `sizeBuckets` **(PLT-465 — no effect on main)** | seed, λ | `seiload_gas_used` per block, `seiload_send_latency_seconds`, `seiload_inclusion_latency_seconds` | +| Trustworthy tail latency | fixed λ at/above suspected capacity, open-loop | seed, mix | `seiload_inclusion_latency_seconds` p99 via `histogram_quantile`; validity (§5) | +| Throughput knee | λ sweep or `--ramp-up` | seed, mix | `seiload_run_txs_dropped_total`, inclusion rate, `seiload_inclusion_latency_seconds`, `seiload_block_time_seconds` | + +--- + +## 2. Recipe: probe key/state contention + +**Goal:** find how concurrent reads/writes to a hot key-set degrade throughput — i.e. +expose Sei's parallel-execution (block-stm) conflict/abort behavior (see +[04-workload-model](04-workload-model.md) for the Sei mechanism). + +**Design:** `StorageRW` scenario, sweep zipfian skew **θ** while sweeping `recordCount` +(smaller `recordCount` + higher θ = hotter contention). Hold seed, λ, endpoints, and tx +mix constant across the sweep. + +**Read & interpret:** +- Throughput vs θ: as θ rises, committed throughput (`seiload_inclusion_latency_seconds_count / + seiload_run_duration_seconds`) should fall if the SUT serializes on conflicts. A flat curve means you + haven't reached the contention regime — raise θ / shrink `recordCount`. +- Pair with the **SUT's** block-stm conflict/abort rate (a Sei **node-side** signal, + not emitted by sei-load — on Sei this typically surfaces as `sei_occ_*`, but confirm + the exact series name exists on your SUT before relying on it; the node version under + test may not export it at all. File `/issue` if that signal isn't exposed where you + can query it). +- `seiload_block_time_seconds` widening while `seiload_gas_used` holds steady ⇒ execution + is the bottleneck, not block fullness — a contention signature. + +**Next move:** binary-search θ for the knee where throughput drops sharply; that θ is the +contention threshold for this `recordCount`. + +--- + +## 3. Recipe: probe tx-size scaling + +**Goal:** how does per-tx size/gas affect block packing and latency. + +**Design:** sweep the size distribution / `sizeBuckets`. Hold seed and λ fixed. + +**Read & interpret:** +- `seiload_gas_used` histogram (per block) — does the SUT hit the gas ceiling + (`--target-gas`, default 10M)? `histogram_quantile(0.99, …seiload_gas_used_bucket…)` near + the ceiling ⇒ blocks are gas-bound. +- `seiload_send_latency_seconds` and `seiload_inclusion_latency_seconds` — larger txs raise + both if execution/propagation cost scales with size. +- `seiload_block_time_seconds` — rising with size ⇒ production cost is size-sensitive. + +**Next move:** if blocks are gas-bound before λ saturates, you are measuring block-packing, +not throughput — lower per-tx gas or raise `--target-gas` to isolate the variable. + +--- + +## 4. Recipe: measure trustworthy tail latency + +**Goal:** a defensible p99 time-to-chain. + +**Design (load-bearing):** +- `--arrival-model open_loop` — **mandatory**. Closed-loop suffers coordinated omission + and `seiload_inclusion_latency_seconds` is not even recorded there (see [06](06-measurement-metrics.md) §3.2). +- Fixed λ (`--tps`) **at or above** suspected capacity — you want the schedule to expose + the slowdown, not avoid it. +- `--track-receipts` enabled (inclusion histogram requires it). +- Fixed seed. +- Size `--max-in-flight` and `--inclusion-reap-after` so healthy txs aren't reaped: + registry cap auto-sizes from `TPS × reapAfter × 1.5`, but verify `dropped_at_cap == 0`. + +**Read:** +```promql +histogram_quantile(0.99, sum by (le) (rate(seiload_inclusion_latency_seconds_bucket[1m]))) +``` + +**Interpret / validity gate:** the p99 is only trustworthy if the run wasn't +generator-bound (§5). Confirm `seiload_run_tps_final_per_second ≈ --tps`, +`dropped_at_cap == 0`, and `seiload_block_gaps_total == 0 && seiload_block_fetch_errors_total == 0` +(else `included` is an undercount biasing the tail). If those hold, quote the p99; +otherwise rerun. + +--- + +## 5. Ensuring a run is VALID / comparable + +Run this checklist **before** trusting any number. A failing item invalidates the run. + +| Check | Query / signal | Pass condition | If it fails | +|-------|----------------|----------------|-------------| +| Fixed seed | config | identical seed across A/B | reseed; reruns aren't comparable | +| Open-loop for latency | `--arrival-model` | `open_loop` | closed-loop → coordinated omission; rerun | +| Generator kept up | `seiload_run_tps_final_per_second` vs `--tps` | within tolerance | under-loaded; raise workers/λ headroom | +| Drops are real shedding | `seiload_run_txs_dropped_total` | tracks SUT saturation, not generator stalls | suspect generator/`maxInFlight`; see [06](06-measurement-metrics.md) §3.4 | +| No registry starvation | `seiload_inclusion_outcome_total{outcome="dropped_at_cap"}` | `== 0` | raise `--inclusion-reap-after` / cap; inclusion undercounted | +| No observer loss | `seiload_block_gaps_total`, `seiload_block_fetch_errors_total` | `== 0` | `included` undercounts; treat inclusion rate as a lower bound | +| Sends not erroring en masse | `seiload_run_txs_failed_total`, `seiload_txs_rejected_total` | low / explained | investigate SUT/client rejection before reading throughput | +| Conservation balances | run-summary log + queries | `registered == included + expired + inflight_at_shutdown` | accounting broken; do not trust derived rates | +| Clean shutdown | drain vs SIGTERM/`--duration` | clean drain preferred for exact accounting | note the shutdown-boundary undercount in your report | + +`schedule_lag` is the ideal generator-validity gate but is a **concept, not an emitted +metric on main** (emitter punted as PLT-463) — you cannot query it. Compute validity from +the heuristics above and state the assumption. See +[06-measurement-metrics](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag) §3.4. + +For fair A/B methodology see [05-reproducibility](05-reproducibility.md); for failure +modes to rule out (what a bad number *isn't*) see [08-limits-boundaries](08-limits-boundaries.md). + +--- + +## 6. Compact run → check → mean → move loop + +A drop-in autonomous sequence for a single run: + +| Run output | Metric to check | What it means | Next move | +|------------|-----------------|---------------|-----------| +| Run started | `seiload_tps_achieved_per_second`, `seiload_worker_queue_length` | Is the generator hitting λ? | Queue rising + TPS < λ ⇒ add `--workers` | +| Mid-run | `seiload_block_time_seconds`, `seiload_gas_used` p99 | Is the SUT block-bound or gas-bound? | Gas-bound ⇒ adjust tx size / `--target-gas` | +| Mid-run | `seiload_run_txs_dropped_total` climbing | In-flight saturating | Near/above capacity — good for tail latency; bad if you wanted under-capacity | +| End | run-summary log line | Conservation balances? | If not, discard run | +| End | inclusion rate (§3.1 of [06](06-measurement-metrics.md)) | Fraction reaching chain | < target ⇒ SUT shedding; investigate expired vs dropped_at_cap | +| End | `seiload_inclusion_latency_seconds` p99 | Tail time-to-chain | Validity-gate it (§5), then record with seed + config | +| End | `seiload_block_gaps_total`/`seiload_block_fetch_errors_total` | Observer integrity | Nonzero ⇒ inclusion is a lower bound; note it | + +**When a needed signal doesn't exist** (e.g. `schedule_lag`, SUT block-stm aborts where +you can query them), do not paper over it: file an `/issue` naming the exact query you +were trying to write and why, so the gap gets closed rather than guessed around. + +--- + +## See also + +- [01-mental-model](01-mental-model.md) — what sei-load is and isn't. +- [04-workload-model](04-workload-model.md) — arrival model, scenarios, the Sei contention mechanism. +- [05-reproducibility](05-reproducibility.md) — fixed seed, open vs closed loop, fair A/B. +- [06-measurement-metrics](06-measurement-metrics.md) — the metric catalog and PromQL. +- [08-limits-boundaries](08-limits-boundaries.md) — what to rule out before trusting a result. diff --git a/docs/08-limits-boundaries.md b/docs/08-limits-boundaries.md new file mode 100644 index 0000000..99101ee --- /dev/null +++ b/docs/08-limits-boundaries.md @@ -0,0 +1,87 @@ +# Limits & Accepted Boundaries + +> [← AGENTS.md index](../AGENTS.md) + +> What this covers: the known, accepted measurement boundaries in sei-load's send and inclusion paths — what each is, when it bites, why it's accepted, and the counter to check before trusting a run. When an agent needs it: interpreting results, especially deciding whether a non-zero counter invalidates a conclusion or is benign. + +Every boundary below is **accepted by design** and bounded. The contract is conservative: where the tooling can be wrong, it is wrong in a known direction (almost always *undercounting* inclusions, never inventing them). An agent reading a run should treat a non-zero boundary counter as a *confidence discount in a known direction*, not as silent corruption. Grounded in `sender/doc.go`. + +> **Metric names here are conceptual.** Names like `block_gaps`, `dropped_at_cap`, `dropped`, `failed` are the *concepts* to check; the exact queryable series carry the `seiload_` prefix + Prometheus suffixes (e.g. `seiload_block_gaps_total`, `seiload_inclusion_outcome_total{outcome="dropped_at_cap"}`). See [Measurement & Metrics §2](06-measurement-metrics.md) for the authoritative catalog before writing a query. + +## Send path + +### Open-loop shutdown boundary + +- **What:** On a clean drain (generator exhaustion), `admitted == succeeded + failed` holds exactly. On `ctx` cancel (SIGTERM or `--duration` expiry), txs already admitted and buffered for a worker can exit **uncounted** (`sender/doc.go:72-75`). +- **When it bites:** Only on cancellation-terminated runs — duration-bounded or interrupted. Never on a run that ends because the workload drained. +- **Why accepted:** The undercount is bounded by the worker channel backlog (a small fixed buffer), and the conservation identity is exact on clean completion. +- **How to interpret:** If the run ended by duration/SIGTERM and `admitted ≠ succeeded + failed`, the gap is shutdown buffer, not lost load — bounded by backlog. For exact conservation, end runs by generator drain (finite workload) rather than by duration. Check the `dropped` and `failed` gauges in the run summary. + +Related send-path lenses (not boundaries): +- `schedule_lag` (`AttemptedSendTime − IntendedSendTime`) — ⚠️ **a concept, NOT an emitted metric on main**: there is no `schedule_lag` series to query (emitter punted as PLT-463); judge it externally via the [06 §3.4](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag) heuristics. Conceptually it is the primary coordinated-omission gate: non-zero/growing lag means sends are falling behind the open-loop arrival schedule *before* any tx is shed, and latency conclusions are suspect once it's large (`sender/doc.go:119-124`). +- `dropped` — genuine load shed once `maxInFlight` saturates (drop-and-count). This is real backpressure, not buffer geometry (`sender/doc.go:36-48`). +- `failed` — sends that returned a non-nil error; counted, never lost (`sender/doc.go:62-70`). + +Conservation to assert per run: `scheduled == dropped + admitted` and `admitted == succeeded + failed` (the latter exact only on clean drain). + +## Inclusion tracking (`--track-receipts`) + +Inclusion is observed block-by-block by the `InclusionTracker`, not by per-tx receipt polling: it subscribes to new heads, fetches each arriving block body **once**, and stamps `InclusionTime` on matched in-flight txs (`sender/doc.go:88-97`). Conservation: `registered == included + expired + inflight_at_shutdown`, and `registered ⊆ succeeded` — only successful sends are registered, and the inclusion denominator is `succeeded` (`txs_accepted`), never a minted series (`sender/doc.go:99-103`). + +The six accepted boundaries (`sender/doc.go:105-117`): + +### 1. WebSocket head gaps + +- **What:** A missed new-head subscription event is counted (`block_gaps`) but **never backfilled**. Txs in the missed block are not matched and eventually reap as `expired`. +- **When it bites:** Flaky WS connection, or head-arrival faster than the subscriber drains. +- **Why accepted:** Degrades conservatively — an *undercount of inclusions*, never a miscount. +- **Interpret:** Non-zero `block_gaps` ⇒ reported inclusion rate is a **lower bound**; true inclusion is ≥ reported. Don't read an inclusion shortfall as chain-side drops without first checking `block_gaps`. + +### 2. Reorg first-observation-wins + +- **What:** On a reorg the tracker uses first-observation-wins (stamp `InclusionTime` + delete from in-flight); there is no canonical-chain reconciliation. +- **When it bites:** Chain reorgs during the run. +- **Why accepted:** Inclusion-time error is bounded by `reorg_depth × block_time`. +- **Interpret:** If the SUT reorged, inclusion-latency samples carry up to `reorg_depth × block_time` of error. On a stable chain this is zero. Treat inclusion *latency* (not the count) as the affected metric. + +### 3. Single fetch endpoint + +- **What:** Block bodies are fetched from one endpoint only — `Endpoints[0]`, shared with the block collector. +- **When it bites:** Always present; it adds a small read load to that one node and ties inclusion observation to that node's view. +- **Why accepted:** Small added load; single consistent view. +- **Interpret:** `Endpoints[0]` is the inclusion oracle. If you multi-target sends across endpoints, inclusion is still judged from `Endpoints[0]`'s chain view. Note that contract scenarios also deploy/bind against `Endpoints[0]` (`generator/scenarios/*.go` `Attach`). + +### 4. Header-arrival clock + +- **What:** `InclusionTime` is the **header-arrival wall-clock** at the tracker — not fetch-completion time, and not `header.Time` (the block's own timestamp). +- **When it bites:** Always; it's the definition of the inclusion timestamp. +- **Why accepted:** It's the measurable instant closest to "the tracker learned this block exists." +- **Interpret:** `inclusion_latency = InclusionTime − IntendedSendTime` includes network propagation to the tracker. It is **open-loop-only**: in closed-loop, `IntendedSendTime` is enqueue time, so the latency sample is omitted (counts still tracked) (`sender/doc.go:94-97`). Do not compare inclusion-latency across arrival models, and do not equate it with on-chain block timestamp deltas. + +### 5. Failed block fetch + +- **What:** A failed block-body fetch is counted (`block_fetch_errors`) and **not retried**; that block's txs reap as `expired`. +- **When it bites:** Transient RPC errors fetching a body from `Endpoints[0]`. +- **Why accepted:** Same conservative undercount as a WS gap (boundary 1). +- **Interpret:** Non-zero `block_fetch_errors` ⇒ inclusion is again a lower bound. Sum it with `block_gaps` when judging how much of an inclusion shortfall is observational vs. real. + +### 6. Late register / dropped-at-cap + +- **What (late register):** A tx registered *after* its including block was already scanned is missed and reaps as `expired` — bounded by the microsecond register window vs. block time (a rare conservative undercount, same direction as a WS gap) (`sender/doc.go:115-117`). +- **What (dropped-at-cap):** When the inclusion registry hits its cap, registrations are dropped and counted (`dropped_at_cap`); these txs are **excluded from the inclusion denominator** (`sender/doc.go:101-103`). +- **When it bites:** Late-register is rare (register window ≪ block time). `dropped_at_cap` bites under sustained inclusion backlog (registry can't keep up). +- **Why accepted:** Late-register undercount is microsecond-window-bounded; cap-drops are excluded from the denominator so they can't inflate or deflate the inclusion rate. +- **Interpret:** Non-zero `dropped_at_cap` ⇒ the inclusion rate is computed over fewer txs than `succeeded`; it's still correct *for the registered subset* but doesn't cover the whole run. If `dropped_at_cap` is large, raise the registry cap or lower the rate before trusting inclusion as run-wide. + +### Inclusion summary + +- `inclusion_latency` is **open-loop-only** (omitted, not zero, in closed-loop). +- `inflight_at_shutdown` is read only after both workers and tracker have joined (`sender/doc.go:103`), so it is a true terminal residual, not a race artifact. +- Master identity to assert: `registered == included + expired + inflight_at_shutdown`, with `registered ⊆ succeeded`. +- **Direction rule:** boundaries 1, 5, 6(late) all push inclusion *down*. If your run shows fewer inclusions than expected, check `block_gaps + block_fetch_errors + dropped_at_cap` **first** — that sum caps how much of the shortfall is observational before you attribute any of it to the SUT. + +## See also + +- [03-config-reference](03-config-reference.md) — `--track-receipts`, endpoints, registry cap settings. +- [06-measurement-metrics](06-measurement-metrics.md) — the counter series named above. +- [07-experiment-playbook](07-experiment-playbook.md) — how to design runs that keep these boundaries at zero.