diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000..23a668a
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,43 @@
+# sei-load — agent guide
+
+sei-load drives synthetic transaction load at a Sei EVM endpoint and **emits measurements** (counters, histograms, a run summary) about how the system under test (SUT) responds. It is a load generator and a measurement instrument — **not** a judge: it computes no pass/fail verdicts, percentiles, or SLO compliance; you derive those externally from the signals it emits. These docs are the operating manual for an agent that designs, runs, and interprets sei-load experiments and acts on the results.
+
+## Start here (reading order for a new agent)
+
+Read linearly the first time:
+
+1. [docs/01-mental-model.md](docs/01-mental-model.md) — the pipeline, open- vs closed-loop, coordinated omission, and the measure-not-judge philosophy. **Read this first.**
+2. [docs/02-running.md](docs/02-running.md) — build the binary, every CLI flag, the run lifecycle.
+3. [docs/03-config-reference.md](docs/03-config-reference.md) — the JSON config schema behind those flags.
+4. [docs/04-workload-model.md](docs/04-workload-model.md) — scenarios and the StorageRW contention/size/op axes.
+5. [docs/06-measurement-metrics.md](docs/06-measurement-metrics.md) — the authoritative metric catalog and the conservation model; the PromQL you query.
+6. [docs/05-reproducibility.md](docs/05-reproducibility.md) — seed → sub-stream determinism and fair A/B setup. **Read before 07:** the playbook's cardinal rule depends on the seed/fair-A/B mechanics defined here.
+7. [docs/07-experiment-playbook.md](docs/07-experiment-playbook.md) — objective → knobs → read → interpret recipes.
+
+Keep [docs/08-limits-boundaries.md](docs/08-limits-boundaries.md) as a reference — pull it in when a non-zero boundary counter forces you to discount a result.
+
+## Table of contents
+
+| Doc | Covers / when you need it |
+|-----|---------------------------|
+| [01-mental-model.md](docs/01-mental-model.md) | The send pipeline, open-loop vs closed-loop arrival, coordinated omission, conservation identities, and why the tool emits signal not verdicts. The conceptual floor — read before anything else. |
+| [02-running.md](docs/02-running.md) | Building/invoking `seiload`, every CLI flag, settings precedence, the metrics endpoint, copy-pasteable invocations, and the run lifecycle. Need it when starting/stopping/reproducing a run. |
+| [03-config-reference.md](docs/03-config-reference.md) | The complete JSON config schema — `LoadConfig`, `settings`, `scenarios`, `accounts`, `funding`, gotchas. Need it when authoring or editing a config. |
+| [04-workload-model.md](docs/04-workload-model.md) | The scenario set, what each stresses, and the StorageRW key-contention / tx-size / op-mix axes plus what they probe on Sei's parallel executor. Need it when choosing a scenario and shaping load. |
+| [05-reproducibility.md](docs/05-reproducibility.md) | Seed → sub-stream derivation, the exact determinism guarantee, fair A/B setup, open-loop determinism under drops. Need it before comparing two runs. |
+| [06-measurement-metrics.md](docs/06-measurement-metrics.md) | The authoritative 19-instrument catalog, the conservation model, and the PromQL recipes for rates/percentiles/goodput/validity. Need it before writing any query or trusting a number. |
+| [07-experiment-playbook.md](docs/07-experiment-playbook.md) | The reasoning layer: objective → knobs → validity → read → interpret → next move, with recipes for contention, size, and tail-latency experiments. Need it when designing a run. |
+| [08-limits-boundaries.md](docs/08-limits-boundaries.md) | The accepted measurement boundaries (WS gaps, reorgs, single fetch endpoint, header-arrival clock, cap drops) and the counter to check for each. Need it when deciding whether a non-zero counter invalidates a conclusion. |
+
+## Fastest path to a first experiment
+
+1. Build and validate offline: `make build`, then a `--dry-run` invocation — see [docs/02-running.md](docs/02-running.md#common-invocations).
+2. Run an open-loop, fixed-λ measurement with receipt tracking and follow the trustworthy-tail-latency recipe — see [docs/07-experiment-playbook.md](docs/07-experiment-playbook.md) §4, then validity-gate it with §5 before quoting any number.
+
+## Standing caveats (true on `main` today)
+
+- **StorageRW distribution/size/op axes require PLT-465 (#54, unmerged).** `keyDistribution`, `sizeDistribution`, `sizeBuckets`, `recordCount`, and `operations` parse but **do not affect generated transactions** on main — StorageRW emits a fixed scaffold (slot 0, empty pad, all-`rmw`). See [docs/04-workload-model.md](docs/04-workload-model.md).
+- **`schedule_lag` is a concept, not a queryable metric** (emitter punted as PLT-463). Judge generator validity externally via the [06 §3.4](docs/06-measurement-metrics.md) heuristics.
+- **`--report-path` writes a formatted text dump, not JSON** (schema-versioned JSON is PLT-467). The seed is **config-file-only** (no `--seed` flag).
+- **Exported series carry a `seiload_` prefix and unit suffixes.** The Prometheus exporter sets `WithNamespace("seiload")` (configurable), so every series is prefixed `seiload_`, and OTel appends unit suffixes (`s`-unit → `_seconds`, etc.); histograms expose `_bucket`/`_sum`/`_count` and counters end `_total`. The wire names — not the instrument base names — are what you query; [docs/06](docs/06-measurement-metrics.md) §2 lists them.
+- **The tool emits signal, not verdicts.** Every rate, percentile, and pass/fail is computed by you.
diff --git a/README.md b/README.md
index 78c97fd..e169c14 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,5 @@
+> 🤖 For agent-driven experiments, start at [AGENTS.md](AGENTS.md) and docs/ — the authoritative, current operating docs. This README is a human quick-start and may lag.
+
 # sei-load
 [![Tests](https://github.com/sei-protocol/sei-load/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/sei-protocol/sei-load/actions/workflows/build-and-test.yml)
 
diff --git a/docs/01-mental-model.md b/docs/01-mental-model.md
new file mode 100644
index 0000000..701f373
--- /dev/null
+++ b/docs/01-mental-model.md
@@ -0,0 +1,178 @@
+# Mental Model
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers / when an agent needs it. The conceptual foundation you must
+> hold before designing, running, or interpreting a sei-load experiment: the send
+> pipeline, the open-loop arrival model and the coordinated-omission problem it
+> solves, where verdicts come from (not the tool), and the load-bearing
+> vocabulary. Read this first; the config and metric specifics are in the sibling
+> docs linked at the end.
+
+## What sei-load is
+
+sei-load drives synthetic transaction load at a Sei EVM endpoint and **emits
+measurements** about how the system under test (SUT) responds. It is a load
+generator and a measurement instrument. It is **not** a judge: it does not
+compute pass/fail verdicts or SLO compliance (see [Measurement philosophy](#measurement-philosophy)).
+
+## The send pipeline
+
+A transaction flows through a fixed pipeline:
+
+```
+generator → dispatcher → sharded sender → per-endpoint workers → Sei RPC
+```
+
+- **Generator** (`generator.Generator`) produces `*types.LoadTx` values. Each
+  `Generate()` call draws from the seeded PRNG sub-streams (accounts, gas, key/
+  size distributions) — this is the only place workload randomness is consumed.
+- **Dispatcher** (`sender.Dispatcher`) owns the arrival timing. It runs in one of
+  two arrival models (below) and hands each tx to the sender.
+- **Sharded sender** (`sender.ShardedSender`, satisfies `sender.TxSender`) routes
+  each tx to one of N per-endpoint workers by shard. `Send` enqueues into the
+  worker's channel and returns immediately — it is asynchronous.
+- **Workers** (`sender.Worker`) each own one RPC client to one endpoint and run
+  `Tasks` send goroutines over a shared channel. The send goroutine stamps
+  `AttemptedSendTime`, then calls go-ethereum `eth_sendRawTransaction`
+  **synchronously**.
+- **Sei RPC** is the SUT. The send returns nil (accepted) or an error (rejected).
+
+A single shared `golang.org/x/time/rate.Limiter` is the one rate authority for
+the whole pipeline. In closed-loop the worker gates on it; in open-loop the
+scheduler reads it as a clock source (see below). When ramping is enabled, a
+`Ramper` drives the limiter's limit up or down via `SetLimit`.
+
+Optionally, when `--track-receipts` is set, successful sends are handed to a
+block-indexed `stats.InclusionTracker` that observes on-chain inclusion by
+scanning arriving blocks (O(blocks), not per-tx receipt polling). See
+[06-measurement-metrics.md](06-measurement-metrics.md).
+
+## The arrival model: why open-loop exists
+
+The dispatcher supports two arrival models, selected by `arrivalModel`
+(`sender.ArrivalModel`, values `"closed_loop"` / `"open_loop"`).
+
+### Coordinated omission (the problem)
+
+In the legacy **closed-loop** model the dispatcher generates the next tx only
+once a sender is free (`runClosedLoop`: generate-then-send in lockstep). The
+dequeue clock is therefore the SUT's clock: **when the SUT slows, the generator
+slows with it and simply stops issuing the requests that would have observed the
+slowdown.** The latency histogram under-reports, because the worst-affected
+requests were never sent. This is **coordinated omission** — the closed-loop
+model lies about latency precisely when the answer matters most (under stress).
+
+### Open-loop (the fix)
+
+The **open-loop** model decouples the arrival clock from sender availability
+(`sender.openLoopScheduler`). Transaction `i` is scheduled at a fixed instant
+**`t₀ + i/λ`**, where `t₀` is the run start and `λ` is the target rate, regardless
+of whether any sender is free.
+
+Properties that make it honest:
+
+- **Absolute-instant scheduling.** The scheduler sleeps until each absolute
+  instant (`SleepUntil(nextSend)`), not for a relative gap, so per-tx scheduling
+  slop cannot accumulate into clock drift over a long run.
+- **λ as a clock, not a gate.** λ is sampled from the shared limiter on each step
+  (`limiter.Limit()`), so a ramping rate is honored; at fixed λ the running sum
+  telescopes to exactly `t₀ + i/λ`. The limiter is read here as a clock source —
+  the schedule advances whether or not the SUT keeps up.
+- **Bounded in-flight + drop-and-count.** The arrival clock is **never throttled
+  by backpressure** (throttling would reintroduce coordinated omission). Instead
+  a counting semaphore bounds true in-flight sends to `maxInFlight`. At each
+  scheduled instant the scheduler does a non-blocking `TryAcquire`: if senders are
+  saturated the tick is **dropped and counted** and the clock moves on. The permit
+  is held across the full unacked-in-flight window (enqueue + RPC round-trip) and
+  released only after the synchronous send returns (via `tx.OnComplete`), so
+  `maxInFlight` bounds real in-flight work and the drop count measures genuine
+  load shed, not buffer geometry.
+- **Admit before generate.** The permit is acquired **before** the generator is
+  drawn. A dropped tick draws no tx (no seeded-stream consumption, no signer CPU),
+  which makes admitted txs a deterministic prefix of the seeded sequence — see
+  [05-reproducibility.md](05-reproducibility.md).
+
+Closed-loop is retained only as the **legacy regression baseline**. For any
+experiment where tail latency under load matters, use open-loop.
+
+To use open-loop: set `arrivalModel: "open_loop"` and a finite positive rate
+(`tps > 0` or `rampUp: true`); validation rejects open-loop with no finite λ.
+See [03-config-reference.md](03-config-reference.md).
+
+### Conservation (how counts must add up)
+
+Every scheduled tick reaches exactly one terminal state, and the dispatcher folds
+these into the run summary:
+
+```
+scheduled = dropped + admitted
+admitted  = succeeded + failed
+```
+
+- **dropped** — shed because in-flight was saturated at the scheduled instant
+  (never admitted, never sent).
+- **admitted** — took a permit and drew a tx.
+- **succeeded** — admitted, send returned nil (`DispatcherStats.TotalSent`).
+- **failed** — admitted, send returned an error. **Counted, never lost**
+  (`DispatcherStats.Failed`); a send error does not tear down the run.
+
+In closed-loop, `Failed` and `Dropped` are always 0.
+
+A finite workload ends when the generator drains; the terminal probe that
+discovers this advances neither clock, index, nor counters. On a clean drain
+`admitted == succeeded + failed` holds exactly. On `ctx` cancel (SIGTERM /
+duration limit) some admitted txs may still be buffered for a worker and exit
+uncounted — a bounded undercount that never affects a cleanly completed run.
+
+## Measurement philosophy
+
+**The generator emits measurements; it does not pronounce verdicts.** SLO
+judgments, A/B comparisons, and pass/fail decisions are computed **externally**
+via metric queries against the telemetry the tool emits — they are not owned by
+sei-load. This shapes how you consume outputs:
+
+- Treat sei-load output as raw signal (counters, histograms, the run summary),
+  not as a graded result.
+- Build your verdict logic in your query/analysis layer, gating on the run-level
+  arrival model (see next point).
+- **A tx cannot self-describe which model produced it.** An open-loop and a
+  closed-loop `LoadTx` are byte-identical; coordinated-omission safety is a
+  property of the run's arrival model, not of any per-tx field. Latency and
+  schedule-lag consumers **must gate on the run-level `arrivalModel`** before
+  trusting a latency or schedule-lag sample. In closed-loop, `IntendedSendTime`
+  is merely the back-pressured enqueue time, so derived latency is omitted /
+  meaningless.
+
+> **`schedule_lag` is a concept, not a metric on main today.** It is the
+> coordinated-omission/validity quantity `AttemptedSendTime − IntendedSendTime`,
+> computed and judged **externally** — there is no `schedule_lag` series on
+> `/metrics` (the emitter was punted as PLT-463). Do not write a query against it;
+> see [06-measurement-metrics.md](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag)
+> for the external validity heuristics that stand in for it.
+
+## Glossary
+
+| Term | Meaning |
+|---|---|
+| **λ (lambda)** | Target arrival rate (tx/s). In open-loop, sampled from the shared limiter each step as a clock source; the inter-arrival gap is `1/λ`. |
+| **t₀** | Run start instant; the anchor for the open-loop schedule. |
+| **intended send time** | `IntendedSendTime` = `t₀ + i/λ`, the true scheduled instant (open-loop). In closed-loop it is the enqueue time instead — not a real schedule. |
+| **attempted send time** | `AttemptedSendTime`, the wall clock when a worker actually called the RPC. |
+| **inclusion time** | `InclusionTime`, the header-arrival wall clock of the block that included the tx (set only when `--track-receipts`). |
+| **schedule_lag** | `AttemptedSendTime − IntendedSendTime`. The primary coordinated-omission gate: it shows sends falling behind the arrival schedule even before any tx is shed. Open-loop only. **A concept, not a metric on main** — computed/judged externally; not a queryable series (emitter punted as PLT-463). |
+| **SequenceIndex** | The arrival-tick index `i`. Monotonic; under drops it is non-contiguous across admitted txs (dropped ticks advance `i` and the clock but consume no draw). |
+| **admitted** | A tick that took an in-flight permit and drew a tx. |
+| **dropped** | A tick shed because in-flight was saturated (drop-and-count). |
+| **failed** | An admitted tx whose send returned an error (counted, not lost). |
+| **in-flight** | Concurrent unacked sends, bounded by `maxInFlight` via the semaphore; a permit is held enqueue → RPC return. |
+| **drop-and-count** | The open-loop overload policy: shed and tally overdue ticks rather than throttle the arrival clock. |
+
+## See also
+
+- [02-running.md](02-running.md) — invoking a run.
+- [03-config-reference.md](03-config-reference.md) — every config/CLI setting.
+- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts.
+- [05-reproducibility.md](05-reproducibility.md) — seeds, sub-streams, A/B.
+- [06-measurement-metrics.md](06-measurement-metrics.md) — emitted metrics and the run summary.
+- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for common experiments.
diff --git a/docs/02-running.md b/docs/02-running.md
new file mode 100644
index 0000000..a524963
--- /dev/null
+++ b/docs/02-running.md
@@ -0,0 +1,186 @@
+# Running sei-load
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers / when an agent needs it: how to build and invoke the `seiload`
+> binary, every CLI flag and its effect, settings precedence, the metrics
+> endpoint, a ladder of copy-pasteable invocations, and the run lifecycle
+> (prewarm → run → run-summary → flush). Read this when you are about to start,
+> stop, or reproduce a run. For the meaning of config-file fields, see
+> [03-config-reference.md](03-config-reference.md).
+
+## Build and run
+
+```bash
+make build                       # produces ./build/seiload
+./build/seiload --config <path>  # --config is REQUIRED; run fails fast without it
+```
+
+The binary reads one JSON config file (`--config`/`-c`), resolves settings (CLI >
+config > defaults), validates them, then runs until its duration elapses or it
+receives `SIGTERM`/`SIGINT`.
+
+A run with no endpoints or no scenarios in the config is rejected at load time
+(`no endpoints specified` / `no scenarios specified`).
+
+### Metrics endpoint
+
+Prometheus metrics are served at `http://<metricsListenAddr>/metrics` (default
+`0.0.0.0:9090`). OpenMetrics is enabled so exemplars survive scraping. Point your
+scraper here; the run holds the process open for a scrape window after it
+finishes (see [Lifecycle](#lifecycle)). To export traces/OTLP, set
+`OTEL_EXPORTER_OTLP_ENDPOINT` in the environment.
+
+## CLI flags
+
+Every flag below maps 1:1 to a `settings` field (same default) except `--config`,
+`--nodes`, and `--metricsListenAddr`, which are CLI-only. Flag defaults come from
+`DefaultSettings()`; config-file values override these defaults, and CLI flags
+override the config file.
+
+| Flag | Short | Default | Meaning / effect |
+|------|-------|---------|------------------|
+| `--config` | `-c` | (required) | Path to the JSON config file. No default; run aborts if unset. |
+| `--workers` | `-w` | `1` | Tasks (workers) **per endpoint**. Total senders = workers × endpoints. |
+| `--tps` | `-t` | `0` | Target transactions/sec, shared across all workers (single rate limiter). `0` = no limit. Required (>0) for open-loop unless `--ramp-up`. |
+| `--arrival-model` | | `closed_loop` | `open_loop` schedules tx *i* at t₀+i/λ and drops overdue txs; `closed_loop` is the legacy generate-then-send lockstep. See [03](03-config-reference.md#arrivalmodel). |
+| `--max-in-flight` | | `10000` | **Open-loop only.** Max concurrent in-flight sends; txs that would exceed this at their scheduled instant are dropped and counted (the clock is never throttled). Ignored in closed-loop. |
+| `--stats-interval` | `-s` | `10s` | Interval for logging throughput/latency stats and the user-latency tracker tick. |
+| `--buffer-size` | `-b` | `1000` | Channel buffer size per worker. Larger = more in-memory queueing; reduce under memory pressure. |
+| `--dry-run` | | `false` | Simulate generation/sending without hitting the chain. Forces `mockDeploy`. Disables the inclusion tracker (simulated sends never land, would all reap as expired). |
+| `--debug` | | `false` | Log each transaction. High-volume; for small/diagnostic runs only. |
+| `--track-receipts` | | `false` | Enable the block-indexed tx→inclusion tracker (stamps inclusion time; reports included/expired/dropped-at-cap/inflight-at-shutdown). No-op under `--dry-run` or with zero endpoints. |
+| `--inclusion-reap-after` | | `30s` | How long an un-included tx stays in the inclusion registry before being reaped as **expired**. Tune to expected inclusion time on congested chains. Only meaningful with `--track-receipts`. |
+| `--track-blocks` | | `false` | Collect block statistics (block time, gas) from `endpoints[0]`. |
+| `--track-user-latency` | | `false` | Track per-user latency from `endpoints[0]`, sampled at `--stats-interval`. |
+| `--prewarm` | | `false` | Prewarm accounts with self-transactions before the main run (warms nonces/state; excluded from main stats). |
+| `--ramp-up` | | `false` | Drive load with a built-in ramp curve instead of a fixed rate. Provides a finite λ for open-loop without a fixed `--tps`. Curve is fixed in code: start 100 TPS, +100 per step, 120s load interval, 30s recovery interval. |
+| `--report-path` | | `""` | Write a **formatted text** report to this path (`/dev/stdout` is valid). Empty = no report file. Note: a text dump today, **not** JSON — schema-versioned run-summary JSON is future work (PLT-467). See [06 §4.2](06-measurement-metrics.md#42---report-path-file--stdout-final-stats). |
+| `--txs-dir` | | `""` | Write generated transactions to this dir instead of sending them (offline tx-writer mode). Forces closed-loop; open-loop is ignored with a logged downgrade. |
+| `--target-gas` | | `10000000` | Target gas per block (tx-writer mode). |
+| `--num-blocks-to-write` | | `100` | Number of blocks to write (tx-writer mode). |
+| `--duration` | | `0` | Run duration. `0` = run until `SIGTERM`/`SIGINT`. |
+| `--post-summary-flush-delay` | | `25s` | In-process sleep AFTER the run-summary metrics are recorded, so Prometheus can scrape final values before exit. Set `0` to exit immediately (you lose the final scrape). |
+| `--nodes` | `-n` | `0` | Limit to the first N endpoints from the config. `0` = use all. |
+| `--metricsListenAddr` | | `0.0.0.0:9090` | `ip:port` for the Prometheus `/metrics` endpoint. |
+
+> Trackers that read chain state (`--track-blocks`, `--track-user-latency`,
+> `--track-receipts`, and the ramper's block collector) all read from
+> `endpoints[0]` only. Put a representative/stable RPC first.
+
+> **No `--seed` flag.** The seed is **config-file-only** (top-level `seed`,
+> `LoadConfig.Seed *uint64`). To pin or replay a workload, set `seed` in the config
+> file — there is no CLI override. See
+> [05-reproducibility.md](05-reproducibility.md#setting-the-seed).
+
+> **`seiChainID` casing is cosmetic only.** The struct tag is `seiChainID` (capital
+> `ID`), and several shipped profiles write `seiChainId` (lowercase `d`). Go's
+> `encoding/json` matches tags **case-insensitively**, so `seiChainId` binds to the
+> same field — `chain_id` is populated and `chain_id`-keyed PromQL works either way.
+> Prefer `seiChainID` for style consistency, but it does **not** affect binding or
+> queries. See [03 gotchas](03-config-reference.md#gotchas).
+
+## Settings precedence
+
+```
+CLI flag  >  config-file "settings"  >  built-in default
+```
+
+Resolution is via viper: defaults are seeded from `DefaultSettings()`, the config
+file's `settings` block is merged, then bound CLI flags override. A field absent
+everywhere falls back to its default. After resolution the settings are validated
+(`Settings.Validate`) and the run aborts on an invalid combination — notably
+`arrival-model open_loop` with no finite rate (`--tps<=0` and not `--ramp-up`).
+
+## Common invocations
+
+Minimal → realistic.
+
+**1. Validate a config without touching the chain (dry-run):**
+```bash
+./build/seiload --config profiles/local.json --dry-run --debug
+```
+Generates and logs transactions; deploys are mocked; no sends. Use this to
+confirm scenarios, accounts, and weights resolve before a real run.
+
+**2. Closed-loop, fixed TPS (legacy baseline):**
+```bash
+./build/seiload --config profiles/local.json --workers 50 --tps 100
+```
+Workers generate then send in lockstep; the shared limiter caps aggregate rate at
+100 TPS. Susceptible to coordinated omission — prefer open-loop for latency
+claims.
+
+**3. Open-loop, fixed λ (coordinated-omission-correct):**
+```bash
+./build/seiload --config profiles/local.json \
+  --arrival-model open_loop --tps 100 --max-in-flight 5000
+```
+Arrivals are scheduled at t₀+i/λ independent of sender availability; if in-flight
+hits `--max-in-flight` the overdue tx is dropped and counted (reported as
+`Open-loop dropped N txs` at exit) rather than slowing the clock.
+
+**4. Ramped run (open-loop, no fixed TPS):**
+```bash
+./build/seiload --config profiles/local.json --arrival-model open_loop --ramp-up
+```
+The ramper supplies a finite, increasing λ to the shared limiter — this satisfies
+open-loop's "finite positive rate" requirement without `--tps`. Final ramp stats
+are logged at exit.
+
+**5. Run with inclusion + block tracking:**
+```bash
+./build/seiload --config profiles/arctic-1.json \
+  --track-receipts --track-blocks --inclusion-reap-after 45s
+```
+Stamps each sent tx and matches it against on-chain blocks from `endpoints[0]`;
+at exit reports `included / expired / dropped_at_cap / inflight_at_shutdown`. On
+a congested chain raise `--inclusion-reap-after` so slow-but-real inclusions are
+not miscounted as expired.
+
+**6. Limit endpoints with `--nodes`:**
+```bash
+./build/seiload --config profiles/local_docker.json --nodes 2
+```
+Uses only the first 2 of the config's endpoints. Useful to A/B fan-out without
+editing the config.
+
+**7. Bounded duration vs. signal-driven:**
+```bash
+./build/seiload --config profiles/local.json --tps 100 --duration 5m   # stops after 5m
+./build/seiload --config profiles/local.json --tps 100                  # runs until Ctrl-C / SIGTERM
+```
+
+## Lifecycle
+
+A run proceeds in this order:
+
+1. **Load + resolve + validate** config and settings; abort fast on bad combos.
+2. **Setup**: start the metrics server, observability, block/user-latency/inclusion
+   trackers (per flags), and connect the sharded sender.
+3. **Fund** the account pool (only if `funding` is set and not `--dry-run`).
+4. **Prewarm** (if `--prewarm`): self-transactions warm accounts; excluded from
+   main stats (the stats logger starts *after* prewarm).
+5. **Run**: dispatcher drives the workload (open- or closed-loop) under the shared
+   rate limiter; stats logged every `--stats-interval`.
+6. **End**: the run stops when `--duration` elapses (context timeout) or a
+   `SIGTERM`/`SIGINT` arrives. Workers and trackers drain and join.
+7. **Run summary**: final stats are logged, the inclusion conservation identity is
+   read after join (so `inflight_at_shutdown` is final), and a run-summary metric
+   is emitted (`arrival_model`, `dropped`, `failed`, inclusion counts).
+8. **Flush window**: the process sleeps `--post-summary-flush-delay` (default
+   `25s`) so Prometheus can scrape the final summary, then exits cleanly. A
+   `context.Canceled`/`DeadlineExceeded` from a clean duration/signal stop is
+   treated as success (exit 0).
+
+> If you scrape final summary metrics, the scrape interval must be shorter than
+> `--post-summary-flush-delay`, or set the delay higher. Setting it to `0` exits
+> immediately and the last scrape is lost.
+
+## See also
+
+- [01-mental-model.md](01-mental-model.md) — what sei-load is and how its pieces fit.
+- [03-config-reference.md](03-config-reference.md) — the full config schema these flags mirror.
+- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts.
+- [06-measurement-metrics.md](06-measurement-metrics.md) — what the metrics/summary mean.
+- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for reproducible experiments.
diff --git a/docs/03-config-reference.md b/docs/03-config-reference.md
new file mode 100644
index 0000000..0051d0e
--- /dev/null
+++ b/docs/03-config-reference.md
@@ -0,0 +1,262 @@
+# Config reference
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers / when an agent needs it: the complete JSON config schema for
+> sei-load — top-level `LoadConfig`, the `settings` block (every field, type,
+> default, and run effect), `scenarios`, `accounts`, and `funding` — with an
+> annotated example and the field interactions that change a run's behavior. Read
+> this when authoring or editing a config. For how to invoke the binary and the
+> CLI-flag equivalents, see [02-running.md](02-running.md).
+
+The config is a single JSON object parsed into `config.LoadConfig`. Every field is
+optional except `endpoints` and `scenarios` (the loader rejects a config missing
+either). Unknown fields are ignored, and **a config that uses no new fields runs
+the legacy closed-loop path unchanged** — the schema is additive by construction.
+
+## Top-level `LoadConfig`
+
+| Field | JSON key | Type | Default | Meaning |
+|-------|----------|------|---------|---------|
+| ChainID | `chainId` | int64 | `0` | EVM chain ID used to sign transactions. Must match the target chain. |
+| SeiChainID | `seiChainID` | string | `""` | Textual chain ID used to tag metrics and block/inclusion collectors. Key casing is cosmetic — `seiChainId` (lowercase `d`) also binds (see [gotcha](#gotchas)). |
+| Endpoints | `endpoints` | []string | (required) | RPC endpoints. Workers shard across all of them; trackers read only `endpoints[0]`. |
+| Accounts | `accounts` | object | none | Shared account pool (see [Accounts](#accounts)). |
+| Scenarios | `scenarios` | []object | (required) | Weighted workload mix (see [Scenarios](#scenarios)). |
+| Settings | `settings` | object | `DefaultSettings()` | Run knobs; CLI flags override (see [Settings](#settings)). |
+| Funding | `funding` | object | none | Root-key funding of the account pool (see [Funding](#funding)). |
+| Seed | `seed` | uint64 | random | Roots the deterministic PRNG. Same seed + config = same workload draw multiset (see [Seed](#seed-reproducibility)). |
+| MockDeploy | `mockDeploy` | bool | `false` | Mock contract deploys. Auto-forced on under `--dry-run`; rarely set by hand. |
+| ReportPath | `reportPath` | string | `""` | Alias also accepted at top level; `settings.reportPath` is the normal place. |
+
+### Annotated example
+
+```jsonc
+{
+  "chainId": 713715,                 // EVM chain id; must match the chain
+  "seiChainID": "arctic-1",          // metric/collector tag (casing cosmetic; seiChainId also binds)
+  "endpoints": [                     // workers shard across these; trackers use [0]
+    "http://rpc-a:8545",
+    "http://rpc-b:8545"
+  ],
+  "seed": 42,                        // optional; omit for a random (recorded) seed
+  "accounts": {                      // shared pool unless a scenario overrides
+    "count": 500,
+    "newAccountRate": 0.0
+  },
+  "funding": {                       // optional; fund the pool from a root key
+    "rootKeyFile": "/etc/seiload-key/root-key.hex",
+    "fundAmountWei": "1000000000000000000",  // 1 SEI; decimal STRING (precision)
+    "batchSize": 200
+  },
+  "scenarios": [
+    { "name": "EVMTransfer", "weight": 7 },
+    { "name": "ERC20",       "weight": 3 }
+  ],
+  "settings": {
+    "workers": 50,
+    "tps": 100,
+    "arrivalModel": "open_loop",
+    "maxInFlight": 5000,
+    "statsInterval": "10s",
+    "bufferSize": 1000,
+    "trackReceipts": true,
+    "inclusionReapAfter": "45s",
+    "trackBlocks": true,
+    "prewarm": true,
+    "postSummaryFlushDelay": "25s",
+    "reportPath": "/dev/stdout"
+  }
+}
+```
+
+## Settings
+
+Every field below has a CLI-flag twin with the same default (see
+[02-running.md](02-running.md#cli-flags)); CLI overrides config overrides default.
+Duration fields are JSON strings parsed by Go's `time.ParseDuration` (e.g.
+`"10s"`, `"45s"`, `"5m"`).
+
+| Field (JSON key) | Type | Default | Effect on the run |
+|------------------|------|---------|-------------------|
+| `workers` | int | `1` | Tasks per endpoint. Total senders = workers × endpoints. (Struct field is `TasksPerEndpoint`.) |
+| `tps` | float64 | `0` | Aggregate target rate via one shared limiter. `0` = unbounded. Required (>0) for open-loop unless `rampUp`. |
+| `arrivalModel` | string | `"closed_loop"` | `"open_loop"` vs `"closed_loop"` — see [arrivalModel](#arrivalmodel). |
+| `maxInFlight` | int | `10000` | **Open-loop only.** Cap on concurrent in-flight sends; overdue txs past the cap are dropped+counted, the arrival clock is never throttled. Ignored in closed-loop. |
+| `statsInterval` | duration | `"10s"` | Stats-logging cadence; also the user-latency tracker tick. |
+| `inclusionReapAfter` | duration | `"30s"` | Time an un-included tx waits before being reaped as **expired**. Only used when `trackReceipts` is on. Too short → real-but-slow inclusions counted expired; too long → inflated in-flight map. Also sizes the inclusion registry cap (≈ tps × reapAfter × 1.5, floored at maxInFlight × 4). |
+| `bufferSize` | int | `1000` | Per-worker channel buffer. Larger = more in-memory queueing; lower under memory pressure. |
+| `dryRun` | bool | `false` | Simulate without sending; forces `mockDeploy`; disables the inclusion tracker. |
+| `debug` | bool | `false` | Log every transaction. Diagnostic/small runs only. |
+| `trackReceipts` | bool | `false` | Enable the block-indexed inclusion tracker (included/expired/dropped-at-cap/inflight-at-shutdown). No-op under `dryRun` or with zero endpoints. Reads `endpoints[0]`. |
+| `trackBlocks` | bool | `false` | Collect block time/gas stats from `endpoints[0]`. |
+| `trackUserLatency` | bool | `false` | Per-user latency sampled at `statsInterval` from `endpoints[0]`. |
+| `prewarm` | bool | `false` | Self-transaction prewarm before the main run; excluded from main stats. |
+| `rampUp` | bool | `false` | Drive load with the built-in ramp curve. Supplies a finite λ to satisfy open-loop without a fixed `tps`. |
+| `reportPath` | string | `""` | Write a **formatted text** report to this path (`/dev/stdout` valid); empty = none. Text dump today, not JSON — schema-versioned JSON is future work (PLT-467). |
+| `txsDir` | string | `""` | Offline tx-writer mode: write generated txs to this dir instead of sending. Forces closed-loop (open-loop logged as ignored). |
+| `targetGas` | uint64 | `10000000` | Target gas/block in tx-writer mode. |
+| `numBlocksToWrite` | int | `100` | Blocks to write in tx-writer mode. |
+| `postSummaryFlushDelay` | duration | `"25s"` | Post-summary sleep so Prometheus scrapes final metrics before exit. `0` = exit immediately (last scrape lost). |
+
+> CLI-only (not in `settings`): `--config`, `--nodes`, `--metricsListenAddr`.
+
+### `arrivalModel`
+
+The single field that most changes a run's semantics.
+
+- **`closed_loop`** (default) — legacy generate-then-send lockstep. Each worker
+  generates a tx, sends it, then generates the next; throughput is bounded by
+  sender latency. Susceptible to **coordinated omission** (slow sends suppress
+  arrivals, hiding tail latency). `maxInFlight` is ignored. Keep as the
+  regression baseline.
+- **`open_loop`** — schedules tx *i* at t₀ + i/λ **independent of sender
+  availability** (the coordinated-omission fix). λ comes from `tps>0` or the ramp
+  curve (`rampUp`). When concurrent in-flight sends would exceed `maxInFlight`,
+  the overdue tx is **dropped and counted** rather than throttling the clock —
+  reported at exit as `Open-loop dropped N txs`. Use this for any latency claim.
+
+Validation (`Settings.Validate`) rejects:
+- an `arrivalModel` other than `open_loop`/`closed_loop`;
+- `open_loop` with no finite positive rate (`tps<=0` **and** not `rampUp`) — λ
+  would be infinite, the inter-arrival gap collapses to 0, and the scheduler spins
+  and drops everything.
+
+### Seed (reproducibility)
+
+`seed` roots the deterministic PRNG sub-streams (keys, sizes, gas, accounts). Same
+seed + same config reproduces the **draw multiset**, so the workload distribution
+is statistically reproducible for fair A/B comparison. Caveats from the code:
+
+- Per-tx emission ordering is reproducible only at a single worker; above one
+  worker the multiset matches but ordering does not, and on-chain arrival order is
+  concurrent regardless.
+- Omitting `seed` means "unseeded": the generator draws a random seed, writes it
+  back, and logs it for after-the-fact replay.
+
+## Scenarios
+
+`scenarios` is a weighted mix. Each entry creates one scenario instance; the
+dispatcher selects among them by `weight` (relative, integer). The same `name` may
+appear multiple times (instances are suffixed `_0`, `_1`, …).
+
+| Field | JSON key | Type | Meaning |
+|-------|----------|------|---------|
+| Name | `name` | string | Scenario kind (case-insensitive match). See list below. |
+| Weight | `weight` | int | Relative selection weight within the mix. |
+| Accounts | `accounts` | object | Optional per-scenario account pool; overrides the shared pool for this scenario. |
+| GasPicker | `gasPicker` | object | Optional gas-limit picker (`fixed`/`random`). |
+| GasFeeCapPicker | `gasFeeCapPicker` | object | Optional `maxFeePerGas` picker. |
+| GasTipCapPicker | `gasTipCapPicker` | object | Optional `maxPriorityFeePerGas` picker. |
+| KeyDistribution | `keyDistribution` | object | Keyspace index distribution (`uniform`/`zipfian`). ⚠️ **Requires PLT-465 (#54, unmerged) — parses but does not affect generated transactions on main.** See [gap](#schema-vs-implementation-gaps). |
+| SizeDistribution | `sizeDistribution` | object | Payload-size distribution. ⚠️ **Same status as `keyDistribution`: requires PLT-465; parses but does not affect generated txs on main.** |
+
+### Scenario names
+
+Matched case-insensitively. Registered on main:
+
+`EVMTransfer`, `EVMTransferFast`, `EVMTransferNoop`, `ERC20`, `ERC20Noop`,
+`ERC20Conflict`, `ERC721`, `Disperse`, `StorageRW`.
+
+An unknown name panics at scenario creation — validate with `--dry-run` first.
+
+### Gas pickers
+
+A picker is a tagged object discriminated by `Name`:
+
+```jsonc
+"gasPicker": { "Name": "fixed",  "Gas": 21000 }
+"gasPicker": { "Name": "random", "Min": 21000, "Max": 100000 }   // inclusive range
+```
+
+`random` requires `Min < Max`. With no picker, the scenario uses its built-in
+defaults. Pickers are consumed by the EVMTransfer family (`GenerateGas`); the
+field keys (`Name`, `Gas`, `Min`, `Max`) are capitalized on the wire.
+
+### Distributions
+
+A distribution is discriminated by `Name`:
+
+```jsonc
+"keyDistribution": { "Name": "uniform" }
+"keyDistribution": { "Name": "zipfian", "theta": 0.9 }   // theta in [0, 1)
+```
+
+`zipfian.theta` must be in `[0, 1)`; `0` is uniform, larger hotspots low indices.
+⚠️ These distributions (and the related `recordCount`, `sizeBuckets`, and
+`operations` op-mix axes) **require PLT-465 (#54, unmerged as of writing) — on
+main they parse but do not affect generated transactions.** See the
+[implementation gap](#schema-vs-implementation-gaps) before relying on these for
+workload skew.
+
+## Accounts
+
+```jsonc
+"accounts": {
+  "count": 500,           // pool size
+  "newAccountRate": 0.0   // fraction of txs that mint a fresh recipient account
+}
+```
+
+| Field | JSON key | Type | Default | Effect |
+|-------|----------|------|---------|--------|
+| Accounts | `count` | int | `0` | Number of pre-generated accounts in the pool. |
+| NewAccountRate | `newAccountRate` | float64 | `0.0` | Fraction of transactions that target a newly-minted account instead of a pool member. `0` = fixed pool. |
+
+A top-level `accounts` block is the **shared pool** for all scenarios; a
+per-scenario `accounts` block creates a separate pool for that scenario. If
+neither exists, scenario creation errors (`no accounts config defined`).
+
+**Funding interaction:** funding requires `newAccountRate == 0` everywhere (both
+top-level and per-scenario). On-demand accounts are never funded, so their first
+tx would fail for gas — `ValidateFunding` rejects the combo at load.
+
+## Funding
+
+Optional. When set (and not `--dry-run`), the account pool is funded from a root
+key at startup so the run works against a real chain.
+
+| Field | JSON key | Type | Default | Meaning |
+|-------|----------|------|---------|---------|
+| RootKeyFile | `rootKeyFile` | string | `""` | Path to a file holding the root account's hex private key. **Preferred** — not exposed in the process environment. |
+| RootKeyEnv | `rootKeyEnv` | string | `""` | Env var name holding the hex key. Fallback when `rootKeyFile` is unset. |
+| FundAmountWei | `fundAmountWei` | string | `"1000000000000000000"` (1 SEI) | Per-account funding in wei. **Decimal STRING** (JSON numbers lose precision above 2^53). |
+| BatchSize | `batchSize` | int | `200` | Recipients per `disperseEther` call. |
+
+`ValidateFunding` requires exactly one key source (`rootKeyFile` or `rootKeyEnv`)
+and `newAccountRate == 0` across all account configs.
+
+## Gotchas
+
+- **`seiChainID` casing is cosmetic.** The struct tag is `seiChainID` (capital `ID`),
+  and several shipped profiles write `seiChainId` (lowercase `d`). Go's `encoding/json`
+  matches tags **case-insensitively**, so `seiChainId` binds to the same field — the
+  value is populated and the `chain_id` metric label and `chain_id`-keyed PromQL work
+  regardless of casing. Prefer `seiChainID` for style consistency only; it has **no
+  effect on binding or queries**.
+- **Durations are strings.** `"10s"`, not `10`. A bare number fails to parse.
+- **`fundAmountWei` is a string.** Quoting matters; an unquoted big number loses
+  precision or fails.
+- **Trackers read `endpoints[0]` only.** Order endpoints so the first is stable.
+
+## Schema vs. implementation gaps
+
+Verified against main at doc time:
+
+- ⚠️ **`keyDistribution` / `sizeDistribution` / `sizeBuckets` / `recordCount` /
+  `operations` require PLT-465 (#54, unmerged as of writing) — on main these
+  fields parse but do not affect generated transactions.** They parse, validate,
+  and bind to deterministic RNG sub-streams in the generator, but **no scenario on
+  main calls `SampleIndex` on them** — the only `SampleIndex` call site is inside
+  `config/distribution.go` itself. Setting these fields today has no behavioral
+  effect on emitted transactions. PLT-465 (#54) is the pending PR that wires
+  scenario sampling; once it lands, revisit this note and the StorageRW axes in
+  [04-workload-model.md](04-workload-model.md).
+
+## See also
+
+- [01-mental-model.md](01-mental-model.md) — the pieces and how they connect.
+- [02-running.md](02-running.md) — invoking the binary; CLI-flag equivalents.
+- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts in depth.
+- [06-measurement-metrics.md](06-measurement-metrics.md) — interpreting metrics and the run summary.
+- [07-experiment-playbook.md](07-experiment-playbook.md) — reproducible experiment recipes.
diff --git a/docs/04-workload-model.md b/docs/04-workload-model.md
new file mode 100644
index 0000000..2933027
--- /dev/null
+++ b/docs/04-workload-model.md
@@ -0,0 +1,121 @@
+# Workload Model
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers: the scenario set sei-load can generate, what each one stresses, and the StorageRW contention/size/op-mix knobs that let an agent dial conflict and tx size. When an agent needs it: designing an experiment — choosing a scenario and configuring the axes that produce the load shape under test.
+
+> ⚠️ **Requires PLT-465 (#54, unmerged as of writing).** The StorageRW per-tx axes described below — `keyDistribution`, `sizeDistribution`, `sizeBuckets`, `recordCount`, and `operations` op-mix sampling — **parse but do not affect generated transactions on main**. They only shape txs once PLT-465 lands. On main today StorageRW emits a fixed scaffold (single slot 0, empty pad, all-`rmw`). Treat the axis sections as the PLT-465 interface, not current behavior.
+
+Each scenario is a `TxGenerator` resolved by lowercase `name` through the factory (`generator/scenarios/factory.go`). The `name` you put in a scenario config is one of the registered keys below. Unknown names panic at startup.
+
+## Scenario set
+
+Registered names (from `scenarioFactories`, `generator/scenarios/factory.go:13`):
+
+| `name` | Contract | Per-tx action | What it stresses |
+|--------|----------|---------------|------------------|
+| `evmtransfer` | none | Native value transfer, `value = now().Unix()`, gas 21000 | Baseline native-path throughput; signature recovery + balance update. Value varies per tx (non-zero). |
+| `evmtransferfast` | none | Native transfer, fixed `value = 1e12`, **zero tip** | Same as above with constant value and no priority fee — cheapest native baseline. (Registered name is also `evmtransfer` via `Name()`; distinct factory key `evmtransferfast`.) |
+| `evmtransfernoop` | none | Self-transfer, `value = 0`, gas 21000 | Native path with no balance delta — isolates execution overhead from state change. |
+| `erc20` | ERC20 | `transfer(to, 1)`, gas 72156 | Real ERC20 SSTORE path: two balance-slot writes per transfer. Distinct sender/receiver slots → low cross-tx conflict. |
+| `erc20noop` | ERC20Noop | `transfer(to, 1)`, gas 22460 | ERC20 ABI surface with a no-op body — measures dispatch/calldata cost without the storage writes. |
+| `erc20conflict` | ERC20Conflict | `transfer(to, 1)`, gas 22460 | ERC20 variant engineered so transfers contend on shared state → drives the parallel executor's conflict path via an ERC20 shape. |
+| `erc721` | ERC721 | `mint(to, id)`, monotonic `id` (atomic counter), gas 22460 | NFT mint path: fresh-slot SSTORE per tx plus a contended counter. |
+| `disperse` | Disperse | `disperseEtherFixed(targets)` to 100 fresh accounts/tx | Fan-out: one tx touching 100 distinct recipient accounts. Account-creation heavy. |
+| `storagerw` | StorageRWv1 | `read`/`write`/`rmw` against a caller-chosen slot, with calldata pad | **The tunable axis scenario.** SLOAD+SSTORE storage path with configurable key contention, tx size, and op mix. See below. |
+
+Gas limits above are the per-tx `GasLimit` each scenario pins (`CreateContractTransaction` / `CreateTransaction`). On a gas-limit-admission chain the limit — not gas used — reserves block space, so these are sized tight. A `gasPicker` in config overrides the native-transfer gas; contract scenarios pin their own limit.
+
+All contracts compile to the **`paris`** EVM target (solc 0.8.19, `Makefile:39`). `paris ⊂ Sei`'s active fork, so bytecode is unconditionally safe to deploy; runtime gas is set by the chain's live fork regardless of compile target (`Makefile:31-38`).
+
+## StorageRW: the two axes
+
+StorageRW (`generator/scenarios/StorageRW.go`) is the scenario built for parametric conflict/size experiments. Per tx it makes three **independent** draws — slot (key contention), pad length (tx size), and operation (op mix) — each on its own seeded RNG sub-stream, then builds a `read`/`write`/`rmw` call against `StorageRWv1`.
+
+> ⚠️ **Requires PLT-465 (#54, unmerged as of writing) — on main these fields parse but do not affect generated transactions.** The per-tx slot, op, and pad axes are delivered by PLT-465 (not yet on main). On main, StorageRW is a scaffold: every tx is a fixed-slot-0, empty-pad `rmw` (`generator/scenarios/doc.go` "StorageRW scaffold"). What follows is the PLT-465 interface.
+
+The contract `StorageRWv1` (`generator/contracts/StorageRWv1.sol`) is mapping-backed (`mapping(uint256 => uint256) store`) with no fixed keyspace — the slot index is caller-chosen, so the keyspace resizes with config and never needs a redeploy. `read` folds the load into `readAccumulator` so the SLOAD is non-elidable; `rmw` does `store[slot] += 1`; `write` sets `store[slot] = 1`. All use `unchecked` arithmetic so no tx ever reverts on overflow.
+
+**Defaults (nil-guarded, the 100%-conflict baseline):** with no `keyDistribution`/`recordCount`, every tx hits fixed slot 0 (`pickSlot` — PLT-465 branch, not on main). With no `sizeDistribution`/`sizeBuckets`, the pad is empty (`pickPad` — PLT-465 branch, not on main). With no `operations`, every tx is `rmw` (`pickOp` — PLT-465 branch, not on main). So bare `{"name":"storagerw"}` = single-slot, empty-pad, all-rmw = maximum contention. (On main this is the *only* behavior — see the banner; the scaffold is unconditionally fixed-slot-0, empty-pad, `rmw`.)
+
+### Axis 1 — KEY CONTENTION
+
+The slot each tx touches is `keyDistribution.SampleIndex(recordCount)` — a draw in `[0, recordCount)` (PLT-465 branch, not on main). Contention is the probability that two txs in the same block draw the same slot.
+
+- **Keyspace size** = `recordCount`. Larger → lower collision probability at fixed distribution.
+- **Distribution** = `keyDistribution`: `uniform` (flat) or `zipfian` with `theta` in `[0, 1)`.
+  - `theta → 0`: approaches uniform. Over a large keyspace, collision ≈ 0% (`config/doc.go:28-36`).
+  - `theta → 1`: draws concentrate on low indices → a hotspot. `theta` is validated to `[0, 1)`; `alpha = 1/(1-theta)` diverges at 1 (`distribution.go:163`).
+  - `recordCount = 0` (or no `keyDistribution`): single slot 0 = **100% conflict**.
+
+To set X contention, configure:
+
+```jsonc
+// ~0% conflict: uniform over a large keyspace
+{ "name": "storagerw",
+  "keyDistribution": {"Name": "uniform"},
+  "recordCount": 1000000 }
+
+// moderate hotspot: zipfian, low indices favored
+{ "name": "storagerw",
+  "keyDistribution": {"Name": "zipfian", "theta": 0.9},
+  "recordCount": 1000000 }
+
+// 100% conflict: single slot (omit key config)
+{ "name": "storagerw" }
+```
+
+Verified on the PLT-465 branch: `TestStorageRWContentionSweep` (not on main) pins both ends — uniform over 1e6 with 2000 draws is >99% distinct slots; default config is always slot 0.
+
+Note `recordCount` is the keyspace the distribution **indexes into**, not a count of distinct slots that will be touched in a run. Actual collision in a single block is a function of `recordCount`, distribution shape, and how many StorageRW txs land in that block (i.e. your rate ÷ block production).
+
+### Axis 2 — TX SIZE
+
+Each tx carries a zero-filled calldata pad whose length is `sizeBuckets[sizeDistribution.SampleIndex(len(sizeBuckets))]` (`pickPad` — PLT-465 branch, not on main). The pad is an ignored `bytes _pad` argument on every method — it varies tx size without touching the storage logic.
+
+- `sizeBuckets`: the histogram of candidate pad lengths in bytes, e.g. `[0, 64, 256, 1024]`. Each entry capped at 1 MiB (`config.go`).
+- `sizeDistribution`: `uniform` or `zipfian`, selects which bucket index per tx.
+- **Gas:** the pad's intrinsic cost is `4 gas per zero byte` (the base calldata gas schedule for zero bytes — this rate predates and is unchanged by EIP-2028, which only lowered the *non-zero* byte cost from 68→16) added on top of the 50k base: `GasLimit = 50000 + len(pad)*4` (PLT-465 branch, not on main). A larger pad → larger tx → more calldata gas, scaling block-space consumption per tx.
+
+```jsonc
+{ "name": "storagerw",
+  "keyDistribution": {"Name": "uniform"}, "recordCount": 1000000,
+  "sizeDistribution": {"Name": "uniform"},
+  "sizeBuckets": [0, 64, 256, 1024] }
+```
+
+**Independence (load-bearing):** the size draw rides sub-stream `dist:%d:size`, distinct from the key sub-stream `dist:%d:key` (`utils/rng/streams.go` — both stream IDs are frozen and present on main). Changing the size config never perturbs the key sequence — verified on the PLT-465 branch by `TestStorageRWKeySizeIndependence` (not on main): same seed + same key config yields an identical slot sequence with and without a size distribution. This lets an agent sweep one axis while holding the other's draw multiset fixed.
+
+### Axis 3 — OP MIX
+
+`operations` weights the read/write/rmw selection (`config/operation.go` — PLT-465 branch, not on main). Weights are relative; a per-tx draw picks in proportion to weight over total. Nil or all-zero → all `rmw` (the default, since `OpRmw` is the zero value).
+
+```jsonc
+{ "name": "storagerw",
+  "operations": {"read": 1, "write": 1, "rmw": 2} }
+```
+
+What each op does to conflict: `read` is an SLOAD (folded into `readAccumulator`); `write` and `rmw` are SSTOREs. Two reads of the same slot do **not** conflict under OCC (no write); a read+write or write+write on the same slot **does**. So op mix and key contention compose: a high-`theta` keyspace with all-`read` exhibits far less executor conflict than the same keyspace with all-`rmw`. The op draw rides its own sub-stream `dist:%d:op` — **a PLT-465-future stream ID, NOT one of the streams frozen on main** (main's frozen set is the 8 IDs in [05-reproducibility §Stream IDs that exist](05-reproducibility.md#stream-ids-that-exist); `dist:%d:op` is added only by PLT-465). Verified independent of the key sequence on the PLT-465 branch by `TestStorageRWOpIndependence` (not on main).
+
+## What these axes actually probe on Sei
+
+> This section is domain reasoning about Sei's execution model layered on top of what the code generates. Where a claim is about sei-load code it is cited; where it is about Sei node behavior it is flagged as REASONED — confidence noted. Validate node-side claims against the SUT's own metrics.
+
+**Sei is a parallel-EVM chain with optimistic concurrency control (Block-STM-style).** Transactions in a block are executed speculatively in parallel; a read-set/write-set validation pass detects when one tx read a slot another tx wrote, and re-executes the loser serially. (REASONED — this is the documented Sei/Block-STM design; confidence: high on the mechanism class, medium on exact scheduler details which vary by sei-chain version.)
+
+**Key contention exercises the conflict/abort-and-re-execute path.** When many txs in one block draw the same `store[slot]` and at least one writes it, the optimistic schedule's validation fails for the conflicting txs and they re-execute. As contention rises (smaller `recordCount`, higher `theta`, or single-slot default), the hot slot's throughput degrades toward **serial** as the conflicting write-set fraction → 1 (for that hot slot) — the parallel executor cannot retire conflicting writers concurrently. Throughput for the hot slot is bounded by serialized re-execution, not by parallel width. (REASONED; confidence: high — this is the defining behavior of OCC under write contention.)
+
+**Contrast with a DynamoDB-style hot shard — different mechanism, same observable.** A DynamoDB hot partition degrades because the partition has a fixed WCU/RCU budget and excess requests are **throttled** (a storage-capacity/rate limit). Sei has **no per-key throughput cap**. The limit on a hot slot is *execution-conflict serialization*: the slot can be written as fast as the executor can run the conflicting txs back-to-back, but those txs cannot run *in parallel*. Same surface symptom (hot key → throughput plateaus), fundamentally different cause (OCC re-execution vs. provisioned-capacity throttling). An agent must not interpret a StorageRW hot-slot plateau as a storage-rate limit — there is no quota to raise; the cure is reducing conflict (spread the keyspace) or accepting serial throughput for that slot. (REASONED; confidence: high.)
+
+**Node-side signal to watch:** Block-STM conflict / abort / re-execution rate. On Sei this surfaces (when exposed by the SUT) as `sei_occ_*` metrics. (REASONED — the metric family name is the expected Sei convention; confidence: medium. Confirm the exact series exposed by the node version under test before relying on them; the SUT may not export them at all.) The generator-side signal is unambiguous: you control conflict probability via `recordCount` + `theta` + op mix, and those draws are deterministic for a given seed.
+
+**Gas-model interplay.** The calldata pad (Axis 2) adds `4 gas per zero byte` (the base calldata gas schedule; PLT-465 branch, not on main), so larger txs consume proportionally more block gas and admit fewer txs/block on a gas-limit-admission chain. Size and contention are orthogonal stressors: size limits *how many* txs fit a block; contention limits *how many of those can execute in parallel*. Sweeping both maps the throughput surface. (Code-grounded for the gas formula; the admission behavior is REASONED, confidence: high — consistent with the package doc's "gas-limit-admission" rationale, `generator/scenarios/doc.go`.)
+
+**EVM version.** Contracts target `paris` (solc 0.8.19), a strict subset of Sei's active fork — safe on Sei, and compile target does not distort runtime gas (`Makefile:31-39`). VERIFIED.
+
+## See also
+
+- [03-config-reference](03-config-reference.md) — full Scenario/Distribution JSON schema.
+- [06-measurement-metrics](06-measurement-metrics.md) — the counters to read when interpreting a run.
+- [07-experiment-playbook](07-experiment-playbook.md) — putting axes together into a sweep.
+- [08-limits-boundaries](08-limits-boundaries.md) — measurement boundaries that bound how to read results.
diff --git a/docs/05-reproducibility.md b/docs/05-reproducibility.md
new file mode 100644
index 0000000..658b2d1
--- /dev/null
+++ b/docs/05-reproducibility.md
@@ -0,0 +1,158 @@
+# Reproducibility
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers / when an agent needs it. How to get reproducible workloads for
+> fair A/B exploration: the seed → sub-stream derivation, the exact (and honest)
+> determinism guarantee, how to set up an A/B run, and the open-loop property that
+> keeps the admitted workload stable under SUT-driven drops. Read this before you
+> compare two runs and attribute a difference to the change you made.
+
+## The determinism guarantee — read precisely
+
+sei-load gives you **per-stream draw multiset reproducibility**:
+
+> Same seed + same config ⇒ identical per-stream draw multiset.
+
+That is, the *distribution* of keys, sizes, gas values, and accounts is
+statistically reproducible — which is exactly what fair A/B comparison requires.
+
+**What is NOT guaranteed:**
+
+- **Ordered, byte-identical replay above 1 worker.** With more than one worker,
+  workers interleave their draws into the shared streams non-deterministically, so
+  the ordered per-tx sequence differs run to run at the same seed (the multiset
+  still matches).
+- **On-chain arrival order** is concurrent regardless of worker count, so it is
+  never reproducible.
+
+**Ordered replay holds only at a single worker** (`workers: 1`,
+`TasksPerEndpoint: 1`). If you need byte-for-byte deterministic emission ordering,
+run with one worker. Otherwise, design your analysis around the multiset, not the
+sequence. (Contract source: `utils/rng/rng.go` package doc.)
+
+## Seed → sub-stream derivation
+
+A run is rooted at one `seed`. Each logical consumer draws from its own
+independent sub-stream, derived by the **FROZEN** formula:
+
+```
+substream(seed, streamID) = NewPCG(seed, splitmix64(fnv1a64(streamID)))
+```
+
+- `fnv1a64(streamID)` hashes the consumer name to a uint64.
+- `splitmix64` diffuses it so near-identical names (e.g. `gas:0:base` /
+  `gas:1:base`) seed well-separated PCG states.
+- The result seeds a `math/rand/v2.PCG`.
+
+**Worker-count independence.** Sub-streams are keyed by a *logical* stream id (a
+string naming the consumer/purpose), never by a live-goroutine counter. So the
+per-stream draw multiset a seed yields is invariant to `--workers`: adding workers
+does not shift any stream's sequence.
+
+### The FROZEN one-way-door contract
+
+Changing the derivation breaks replay of every previously saved run. **Four
+inputs are frozen** (`utils/rng/rng.go`), each a one-way door requiring a
+`config_sha256` version bump:
+
+1. The derivation formula (hash, diffusion, PCG argument order).
+2. The set of stream-id strings (`utils/rng/streams.go`). The streamID feeds
+   `fnv1a64`, so renaming any id reseeds that stream. Additions are append-only
+   and do not perturb existing streams.
+3. The per-stream draw order (e.g. drawing base before tip before feecap).
+4. The per-tx account draw cadence: `sender` then `receiver` `NextAccount()` per
+   tx (`generator/scenario.go`), each consuming the account stream.
+
+Replay archives are keyed by `config_sha256`. If you (or a tool) change any frozen
+input, do not expect old saved runs to replay — they will silently produce a
+different draw sequence for the same `(seed, config)`.
+
+### Stream IDs that exist
+
+Defined in `utils/rng/streams.go`. `%d` is the scenario's config index `i`:
+
+| Stream ID | Consumer |
+|---|---|
+| `accounts:shared` | shared (top-level) account pool (`StreamAccountsShared`) |
+| `accounts:scenario:%d` | scenario `i`'s own account pool (`AccountsScenarioStream`) |
+| `weighted:shuffle` | the weighted scenario selector's shuffle (`StreamWeightedShuffle`) |
+| `gas:%d:base` | scenario `i`'s base-gas picker (`GasBaseStream`) |
+| `gas:%d:tip` | scenario `i`'s tip-cap picker (`GasTipStream`) |
+| `gas:%d:feecap` | scenario `i`'s fee-cap picker (`GasFeeCapStream`) |
+| `dist:%d:key` | scenario `i`'s key-distribution index sampler (`KeyDistributionStream`) |
+| `dist:%d:size` | scenario `i`'s size-distribution index sampler (`SizeDistributionStream`) |
+
+## Setting the seed
+
+The seed lives in the **config file**, not on the CLI. Set the top-level `seed`
+field (`config.LoadConfig.Seed`, a `*uint64`):
+
+```json
+{
+  "chainId": 1329,
+  "endpoints": ["http://localhost:8545"],
+  "seed": 42,
+  "scenarios": [ /* ... */ ],
+  "settings": { /* ... */ }
+}
+```
+
+**Unset seed is randomized and recorded.** With no `seed`, the generator resolves
+a cryptographically-random one, writes it back into the config, and logs it:
+
+```
+🎲 No seed configured; generated random seed 12345678901234567890 (set "seed" to replay)
+```
+
+To replay that run after the fact, copy the logged seed into the `seed` field and
+re-run with the same config. (Source: `generator.resolveSeed`,
+`rng.NewRandomSource`.) Note: the resolved seed is surfaced via the log line and
+written back into the in-memory config — it is **not** a field on the emitted
+`stats.RunSummary`, so capture it from the log if you need it.
+
+## Running a reproducible A/B
+
+1. **Pin the seed.** Set `seed` to a fixed value in both arms.
+2. **Hold config constant** across the two arms — same scenarios, weights,
+   distributions, account config, endpoints set.
+3. **Vary exactly one axis** (the thing under test): e.g. `tps`, `maxInFlight`,
+   `arrivalModel`, or a SUT-side change.
+4. Compare the externally-computed metrics (this tool emits signal, not verdicts —
+   see [01-mental-model.md](01-mental-model.md#measurement-philosophy)).
+
+Because the workload is a fixed multiset at a fixed seed, a difference between
+arms is attributable to the one axis you varied (plus concurrency noise above 1
+worker — keep that in mind for tight comparisons; drop to `workers: 1` if you need
+ordered determinism).
+
+Changing scenarios, weights, distribution parameters (e.g. `theta`), account
+config, or any frozen input changes the workload itself — that is no longer a fair
+A/B of one axis.
+
+## Open-loop determinism under drops
+
+A critical property for stress experiments: in open-loop, **admitted txs are a
+deterministic prefix of the seeded sequence**, because a dropped tick draws no tx
+(the permit is acquired *before* `Generate()` — see
+[01-mental-model.md](01-mental-model.md#open-loop-the-fix)).
+
+Consequence: **the same seed yields the same admitted multiset regardless of how
+many ticks SUT slowness forced to drop.** A faster SUT (fewer drops) and a slower
+SUT (more drops) admit different *counts*, but the slower run's admitted set is a
+prefix of the faster run's — the per-stream reproducibility contract holds under
+saturation, where a draw-on-drop scheme would have broken it. `SequenceIndex` is
+the arrival-tick index `i`: monotonic but non-contiguous across admitted txs under
+drops (dropped ticks advance `i` and the clock while consuming no draw).
+
+In closed-loop there is no such admission gate; the SUT speed governs how many
+txs are generated, so the comparison anchor is weaker.
+
+## See also
+
+- [01-mental-model.md](01-mental-model.md) — pipeline, arrival models, glossary.
+- [02-running.md](02-running.md) — invoking a run.
+- [03-config-reference.md](03-config-reference.md) — every config/CLI setting.
+- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts.
+- [06-measurement-metrics.md](06-measurement-metrics.md) — emitted metrics and the run summary.
+- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for common experiments.
diff --git a/docs/06-measurement-metrics.md b/docs/06-measurement-metrics.md
new file mode 100644
index 0000000..23480bf
--- /dev/null
+++ b/docs/06-measurement-metrics.md
@@ -0,0 +1,286 @@
+# 06 — Measurement & Metrics
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers / when an agent needs it: the exact signals sei-load emits, what
+> they mean, and the conservation model that ties them together. Read this before
+> writing any query against a run, before computing a rate/percentile/verdict, and
+> before trusting a number. **The tool emits raw signals only; every rate,
+> percentile, and pass/fail verdict is computed by you, the agent, via queries.**
+
+---
+
+## 1. The conservation model
+
+sei-load supports an **open-loop** arrival model, but it is **opt-in**:
+**closed-loop is the default** (see [01-mental-model](01-mental-model.md)), and open-loop
+is selected with `--arrival-model open_loop` (see [04-workload-model](04-workload-model.md)
+for arrival mechanics). The inclusion identities and inclusion-latency series below are
+valid **only for open-loop runs**; in closed-loop `IntendedSendTime` is enqueue time, so
+the latency sample is omitted (counts still tracked). Every transaction the scheduler
+creates flows through two accounting stages whose terms must balance. These identities are
+the foundation of run validity: if the terms don't add up, the run is suspect.
+
+### Stage 1 — Send accounting (always tracked)
+
+```
+scheduled == dropped + admitted
+admitted  == succeeded + failed
+```
+
+| Term | Meaning |
+|------|---------|
+| `scheduled` | Every arrival tick the open-loop scheduler reaches at instant `t₀ + i/λ`. Not directly emitted; it is the sum of the right-hand side. |
+| `dropped` | Ticks shed because true in-flight was saturated (`maxInFlight` reached) at the scheduled instant. **Genuine load shed**, not buffer geometry. A dropped tick draws no generator and signs no tx. → `seiload_run_txs_dropped_total`. |
+| `admitted` | Ticks that acquired an in-flight permit and were generated + signed + enqueued. |
+| `succeeded` | Admitted txs whose synchronous RPC send returned nil error (accepted by the endpoint). → `seiload_txs_accepted_total`. |
+| `failed` | Admitted txs whose send returned an error. Counted, never lost. → `seiload_run_txs_failed_total` (and per-error `seiload_txs_rejected_total`). |
+
+**Shutdown boundary:** `admitted == succeeded + failed` holds exactly only on a clean
+drain (generator exhaustion). On `ctx` cancel (SIGTERM / `--duration` expiry), some
+admitted txs may still be buffered for a worker and exit uncounted — bounded by channel
+backlog. For latency/goodput claims prefer runs that drain cleanly or are long enough
+that the boundary undercount is negligible.
+
+### Stage 2 — Inclusion accounting (only with `--track-receipts`)
+
+```
+registered == included + expired + inflight_at_shutdown
+registered ⊆ succeeded        (only successful sends are registered)
+```
+
+> Note: `dropped_at_cap` is **not** a term in this identity — it is excluded.
+> `registered = succeeded − dropped_at_cap` (sends rejected at the registry cap were
+> never registered, so they appear in neither side of the conservation balance).
+
+| Term | Meaning |
+|------|---------|
+| `registered` | Successful sends handed to the inclusion tracker. **Not its own series** — by design the denominator for inclusion rate is `succeeded` (`seiload_txs_accepted_total`), never a minted `registered` series. |
+| `included` | Txs observed on-chain (matched in an arriving block, `InclusionTime` stamped). → the `_count` of `seiload_inclusion_latency_seconds` **in open-loop only**; otherwise read from the run-summary log line / `seiload_inclusion_outcome_total` is *not* it. See §3.1. |
+| `expired` | Registered txs reaped un-included after `reapAfter` (default 30s, `--inclusion-reap-after`). → `seiload_inclusion_outcome_total{outcome="expired"}`. |
+| `dropped_at_cap` | Successful sends rejected at the inclusion-registry cap (registry full). **Excluded from the inclusion denominator** — they were never registered. → `seiload_inclusion_outcome_total{outcome="dropped_at_cap"}`. |
+| `inflight_at_shutdown` | Registry size at run end, read after workers + tracker join. → `seiload_run_inflight_at_shutdown`. |
+
+**Conservative degradation (undercounts only, never miscounts):** WS head gaps
+(`seiload_block_gaps_total`), block-body fetch failures (`seiload_block_fetch_errors_total`),
+and late registrations all cause affected txs to reap as `expired` rather than be
+miscounted as included. A nonzero `seiload_block_gaps_total` or
+`seiload_block_fetch_errors_total` means your `included` is an **under**count — factor that
+into inclusion-rate claims.
+
+---
+
+## 2. The emitted-metric catalog
+
+All instruments are OTel, exported on the Prometheus `/metrics` endpoint
+(`--metricsListenAddr`, default `0.0.0.0:9090`; OpenMetrics enabled so exemplars
+survive).
+
+> **Wire names differ from the instrument base names.** The Prometheus exporter is
+> configured with `WithNamespace("seiload")` (configurable; `observability/setup.go`),
+> so every exported series is prefixed **`seiload_`**. OTel also appends **unit
+> suffixes** on scrape: a `s`-unit histogram becomes `…_seconds`, etc. Combined with
+> Prometheus's own suffixing — histograms expose `_bucket`/`_sum`/`_count`, counters end
+> `_total` — the wire name can differ substantially from the base name an instrument is
+> declared with. The catalog and every PromQL below use the **real wire names**. (The
+> `{gas}`, `{height}`, `{transactions}`, `{count}` "annotation" units are dropped, not
+> suffixed; only real units like `s` / `/s` produce a suffix.)
+
+### 2.1 Block & gas signals (require `--track-blocks`)
+
+Emitted by the block collector from new-head subscriptions (`stats/block_collector.go`).
+`seiload_block_time_seconds` is **header-arrival-to-arrival wall clock**, not `header.Time`.
+
+| Metric | Type | Unit | Attributes | Meaning |
+|--------|------|------|-----------|---------|
+| `seiload_gas_used` | histogram | `{gas}` (dropped) | `chain_id` | Gas used per block (`_bucket`/`_sum`/`_count`). Buckets: 1, 1k, 10k, 50k, 100k, 200k, 300k, 400k, 500k, 600k, 700k, 800k, 1M. |
+| `seiload_block_time_seconds` | histogram | `s` | `chain_id` | Wall-clock interval between observed block headers (`_bucket`/`_sum`/`_count`). Buckets: 0.1…1.0 (0.1 step), 2, 5, 10, 20. |
+| `seiload_block_number` | gauge | `{height}` (dropped) | `chain_id` | Highest block height observed (monotonic). |
+
+### 2.2 Send-path signals (always on)
+
+Emitted from the worker send loop (`sender/worker.go`, `sender/metrics.go`).
+
+| Metric | Type | Unit | Attributes | Meaning |
+|--------|------|------|-----------|---------|
+| `seiload_send_latency_seconds` | histogram | `s` | `scenario`, `endpoint`, `chain_id`, `status` (`success`/`failure`) | RPC send round-trip latency (`_bucket`/`_sum`/`_count`). **NOT inclusion latency** — this is enqueue→RPC-return, the SUT-admission cost, not time-to-chain. Buckets: 0.1, 0.2, 0.3, 0.5, 1, 2, 3, 5, 10, 20. Carries trace exemplars. |
+| `seiload_txs_accepted_total` | counter | `{transactions}` (dropped) | `endpoint`, `scenario` | Sends accepted by the endpoint (`succeeded`). The inclusion-rate denominator. |
+| `seiload_txs_rejected_total` | counter | `{transactions}` (dropped) | `endpoint`, `scenario`, `reason` (currently only `rpc`) | Sends the target/client rejected (`failed`). |
+| `seiload_worker_queue_length` | observable gauge | `{count}` (dropped) | `endpoint`, `worker_id`, `chain_id` | Current depth of a worker's send channel. Saturation/backpressure signal. |
+| `seiload_tps_achieved_per_second` | observable gauge | `{transactions}/s` | `endpoint`, `chain_id`, `scenario` | Most recent sender-sampled TPS per endpoint/scenario. |
+
+### 2.3 Inclusion signals (require `--track-receipts`, not under `--dry-run`)
+
+Emitted by `stats.InclusionTracker` (`stats/inclusion_tracker.go`).
+
+| Metric | Type | Unit | Attributes | Meaning |
+|--------|------|------|-----------|---------|
+| `seiload_inclusion_latency_seconds` | histogram | `s` | `chain_id` | `InclusionTime − IntendedSendTime` (`_bucket`/`_sum`/`_count`). **Open-loop only** (in closed-loop `IntendedSendTime` is enqueue time, so the sample is omitted; counts still tracked). Its `_count` is the open-loop `included` total. Buckets: 0.5, 1, 2, 5, 10, 30, 60, 120. |
+| `seiload_inclusion_outcome_total` | counter | `{transactions}` (dropped) | `chain_id`, `outcome` (`expired` \| `dropped_at_cap`) | In-flight txs that left the registry un-included. |
+| `seiload_block_gaps_total` | counter | `{blocks}` (dropped) | `chain_id` | Missed head heights (no backfill). Nonzero ⇒ `included` is an undercount. |
+| `seiload_block_fetch_errors_total` | counter | `{blocks}` (dropped) | `chain_id` | Block-body fetches that failed (no retry); those txs reap as `expired`. Nonzero ⇒ `included` undercount. |
+| `seiload_inclusion_inflight` | observable gauge | `{transactions}` (dropped) | `chain_id` | Live size of the in-flight inclusion registry. |
+
+### 2.4 Run-summary gauges (emitted once at run end)
+
+Recorded by `Collector.EmitRunSummary` (`stats/run_summary.go`) at shutdown, then held
+for `--post-summary-flush-delay` (default 25s) so the final scrape catches them. One
+series per run via the OTel Resource (run-scope) join.
+
+| Metric | Type | Unit | Attributes | Meaning |
+|--------|------|------|-----------|---------|
+| `seiload_run_tps_final_per_second` | gauge | `{transactions}/s` | — | Peak observed overall TPS (10s sliding-window max) for the run. |
+| `seiload_run_duration_seconds` | gauge | `s` | — | Wall-clock run duration. |
+| `seiload_run_txs_accepted_total` | gauge | `{transactions}` (dropped) | — | Total txs accepted by endpoints over the run (collector's `totalTxs`). Gauge already named `…_total`; no extra suffix. |
+| `seiload_run_txs_dropped_total` | gauge | `{transactions}` (dropped) | `arrival_model` | Open-loop txs `dropped` on in-flight saturation. |
+| `seiload_run_txs_failed_total` | gauge | `{transactions}` (dropped) | `arrival_model` | Admitted txs whose send `failed`. |
+| `seiload_run_inflight_at_shutdown` | gauge | `{transactions}` (dropped) | — | Inclusion registry size at end (only emitted when `--track-receipts`). |
+
+> Note: `seiload_run_tps_final_per_second` is a **peak** (sliding-window max), not a
+> mean. For a mean, compute `seiload_run_txs_accepted_total / seiload_run_duration_seconds`.
+
+---
+
+## 3. Verdicts are external — compute them yourself
+
+The tool deliberately emits **counts and histograms**, not rates/percentiles/verdicts.
+You derive those. Concrete recipes follow.
+
+Run-scope identity rides on the OTel **Resource**, not on per-sample labels, so it reaches
+PromQL via a join target (e.g. `target_info` / `seiload_target_info`) rather than as a label
+on each series. The run-scope join keys that *can* exist are
+`seiload_run_id`, `seiload_chain_id`, `seiload_commit_id`, `seiload_workload`,
+`service_instance_id`, and `service_version` (`observability/setup.go`). Each is
+**conditional on its `SEILOAD_*` env var being set** (`service_instance_id` falls back to
+hostname; the rest are omitted when empty) — adjust selectors to your environment. See
+[../observability/README.md](../observability/README.md) for the cardinality rationale and
+how the Resource is exported.
+
+### 3.1 Inclusion rate
+
+`included / succeeded`. In **open-loop**, `included` is the `seiload_inclusion_latency_seconds`
+histogram count:
+
+```promql
+# open-loop inclusion rate over the run
+sum(seiload_inclusion_latency_seconds_count) / sum(seiload_txs_accepted_total)
+```
+
+In **closed-loop**, `seiload_inclusion_latency_seconds` is not recorded — read `included` from the
+run-summary log line (`📦 Inclusion: included=…`) or compute the complement from outcomes:
+`included = registered − expired − dropped_at_cap − inflight_at_shutdown`, where
+`registered = succeeded − dropped_at_cap`. For rate claims that need a histogram count,
+**use open-loop** (§ [05-reproducibility](05-reproducibility.md)).
+
+Subtract the un-included tail explicitly when you need the loss breakdown:
+
+```promql
+sum(seiload_inclusion_outcome_total{outcome="expired"})        # timed out un-included
+sum(seiload_inclusion_outcome_total{outcome="dropped_at_cap"}) # registry full (denominator excludes these)
+```
+
+### 3.2 Latency percentiles (tail)
+
+Use `histogram_quantile` over the open-loop inclusion histogram for **time-to-chain**:
+
+```promql
+# p99 inclusion latency, open-loop only
+histogram_quantile(0.99, sum by (le) (rate(seiload_inclusion_latency_seconds_bucket[1m])))
+```
+
+For **admission latency** (send round-trip, any model):
+
+```promql
+histogram_quantile(0.99, sum by (le) (rate(seiload_send_latency_seconds_bucket[1m])))
+```
+
+Do **not** quote `seiload_inclusion_latency_seconds` percentiles from a closed-loop run — the histogram
+is empty there, and even where it exists closed-loop suffers coordinated omission
+(see [04-workload-model](04-workload-model.md)).
+
+### 3.3 Goodput (committed / offered)
+
+Goodput = on-chain commitments per second relative to what was offered:
+
+```promql
+# committed throughput (TPS)
+sum(seiload_inclusion_latency_seconds_count) / scalar(seiload_run_duration_seconds)
+
+# goodput ratio: committed / offered
+sum(seiload_inclusion_latency_seconds_count) / sum(seiload_run_txs_accepted_total)
+```
+
+Drop and failure fractions of offered load:
+
+```promql
+sum(seiload_run_txs_dropped_total) / (sum(seiload_run_txs_accepted_total) + sum(seiload_run_txs_dropped_total))
+sum(seiload_run_txs_failed_total)  / sum(seiload_run_txs_accepted_total)
+```
+
+### 3.4 Detecting a generator-bound (invalid) run — `schedule_lag`
+
+A run is only a valid load measurement if the generator **kept up with its own
+schedule**. The canonical gate is `schedule_lag = AttemptedSendTime − IntendedSendTime`
+(sends falling behind the arrival schedule even before any tx is shed).
+
+> **`schedule_lag` is a concept, NOT an emitted metric on main today** (the emitter
+> was punted as PLT-463). You cannot query it. Compute run validity externally from
+> the signals that *are* emitted:
+>
+> - **High `seiload_run_txs_dropped_total` with low SUT utilization** ⇒ suspect the
+>   generator (or `maxInFlight`) shed load before the SUT was saturated. Drops should
+>   track SUT saturation, not generator stalls.
+> - **`seiload_run_tps_final_per_second` ≪ configured `--tps`** ⇒ the generator never
+>   reached target rate; the run under-loaded the SUT and latency/throughput numbers are
+>   not at the intended λ.
+> - **Rising `seiload_worker_queue_length`** ⇒ workers are backing up; admission is the bottleneck.
+>
+> If you need a hard generator-validity gate, file an `/issue` requesting a
+> `schedule_lag` histogram (the query you want: `histogram_quantile(0.99,
+> schedule_lag_bucket)` to assert p99 lag < one inter-arrival gap). Until then, treat the
+> heuristics above as the validity check and state the assumption in your report.
+
+---
+
+## 4. Reading the run-summary / final stats output
+
+Two surfaces report end-of-run state. **Both** are worth capturing.
+
+### 4.1 Run-summary gauges + log lines (authoritative for conservation)
+
+At shutdown sei-load logs the conservation tallies and records the §2.4 gauges:
+
+```
+⚠️  Open-loop dropped N txs (in-flight saturated; not throttled)
+⚠️  Open-loop N txs failed to send (admitted but errored; not lost)
+📦 Inclusion: included=… expired=… dropped_at_cap=… inflight_at_shutdown=…
+```
+
+This log line is the ground truth for the Stage-2 identity; cross-check it against your
+`inclusion_*` queries. The gauges persist on `/metrics` for `--post-summary-flush-delay`
+so a final scrape captures them — ensure your scrape interval is shorter than that delay.
+
+### 4.2 `--report-path` file / stdout final stats
+
+`Logger.LogFinalStats` (`stats/logger.go`) prints — and, with `--report-path`, writes —
+a **formatted text report** (not JSON, despite the JSON-tagged `FinalStats` struct).
+Schema-versioned run-summary JSON is future work (PLT-467); do not write a parser
+expecting JSON from `--report-path` today.
+It contains: runtime, total txs, avg/max TPS, per-endpoint P50/P99 (in-process
+percentiles over a 10k-sample ring buffer — coarse, not the histogram), per-scenario
+distribution, and block-time/gas P50/P99/max.
+
+> Caveat: the report's per-endpoint P50/P99 are computed in-process over a bounded
+> latency ring (`maxLatencyHistory = 10000`) and are **send latency**, not inclusion
+> latency. For trustworthy tail-latency claims use the `seiload_inclusion_latency_seconds` /
+> `seiload_send_latency_seconds` histograms via `histogram_quantile` (§3.2), not the report file.
+
+---
+
+## See also
+
+- [01-mental-model](01-mental-model.md) — what sei-load is and isn't.
+- [04-workload-model](04-workload-model.md) — open-loop arrival, λ, drops, coordinated omission.
+- [05-reproducibility](05-reproducibility.md) — fixed seed, open vs closed loop, fair A/B.
+- [07-experiment-playbook](07-experiment-playbook.md) — objective → knobs → interpretation.
+- [08-limits-boundaries](08-limits-boundaries.md) — what to rule out before trusting a result.
diff --git a/docs/07-experiment-playbook.md b/docs/07-experiment-playbook.md
new file mode 100644
index 0000000..c6d76ef
--- /dev/null
+++ b/docs/07-experiment-playbook.md
@@ -0,0 +1,177 @@
+# 07 — Experiment Playbook
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers / when an agent needs it: the reasoning layer on top of the metrics.
+> Given an objective, which knobs to turn, which signals to read, and what they mean for
+> your next move. Read this when you are about to *design* a run, not just interpret one.
+> Metric names and PromQL referenced here are defined in
+> [06-measurement-metrics](06-measurement-metrics.md) — do not re-derive them.
+
+---
+
+## 0. The autonomous run loop
+
+Every experiment is one turn of this loop. Run it deliberately; don't fire-and-forget.
+
+```
+1. OBJECTIVE   → what question am I answering? (capacity? tail latency? contention?)
+2. KNOBS       → set exactly the variables under test; FREEZE everything else (seed!).
+3. VALIDITY    → is this run a fair measurement? (§5 — check BEFORE trusting numbers)
+4. READ        → pull the specific signals the objective needs.
+5. INTERPRET   → what do they mean? does conservation balance?
+6. NEXT MOVE   → adjust one knob, or conclude. Record the seed + config for A/B.
+```
+
+**Cardinal rule for comparability:** change **one** independent variable per run, hold a
+**fixed seed** (set the top-level `seed` in the config file — there is **no `--seed` CLI
+flag**; see [05-reproducibility](05-reproducibility.md)), and use **open-loop**
+(`--arrival-model open_loop`; note closed-loop is the *default*) for any latency or
+capacity claim.
+
+> ⚠️ **StorageRW axes require PLT-465 (#54, unmerged as of writing).** Recipes 2 and 3
+> below sweep `keyDistribution`/`zipfian-θ`/`recordCount`/`sizeDistribution`/`sizeBuckets`/
+> `operations`. **On main these fields parse but do not affect generated transactions** —
+> StorageRW emits a fixed scaffold (slot 0, empty pad, all-`rmw`). Treat the contention and
+> size sweeps as runnable only once PLT-465 lands; see
+> [04-workload-model](04-workload-model.md).
+
+---
+
+## 1. Decision framework — objective → knobs
+
+| Objective | Primary knob(s) | Hold fixed | Read |
+|-----------|-----------------|-----------|------|
+| Key/state contention | scenario `StorageRW`, zipfian-θ over `recordCount` **(PLT-465 — no effect on main)** | seed, λ, tx mix, endpoints | `seiload_tps_achieved_per_second`, `seiload_inclusion_latency_seconds` p99, SUT block-stm abort rate (external) |
+| Tx-size scaling | size-distribution / `sizeBuckets` **(PLT-465 — no effect on main)** | seed, λ | `seiload_gas_used` per block, `seiload_send_latency_seconds`, `seiload_inclusion_latency_seconds` |
+| Trustworthy tail latency | fixed λ at/above suspected capacity, open-loop | seed, mix | `seiload_inclusion_latency_seconds` p99 via `histogram_quantile`; validity (§5) |
+| Throughput knee | λ sweep or `--ramp-up` | seed, mix | `seiload_run_txs_dropped_total`, inclusion rate, `seiload_inclusion_latency_seconds`, `seiload_block_time_seconds` |
+
+---
+
+## 2. Recipe: probe key/state contention
+
+**Goal:** find how concurrent reads/writes to a hot key-set degrade throughput — i.e.
+expose Sei's parallel-execution (block-stm) conflict/abort behavior (see
+[04-workload-model](04-workload-model.md) for the Sei mechanism).
+
+**Design:** `StorageRW` scenario, sweep zipfian skew **θ** while sweeping `recordCount`
+(smaller `recordCount` + higher θ = hotter contention). Hold seed, λ, endpoints, and tx
+mix constant across the sweep.
+
+**Read & interpret:**
+- Throughput vs θ: as θ rises, committed throughput (`seiload_inclusion_latency_seconds_count /
+  seiload_run_duration_seconds`) should fall if the SUT serializes on conflicts. A flat curve means you
+  haven't reached the contention regime — raise θ / shrink `recordCount`.
+- Pair with the **SUT's** block-stm conflict/abort rate (a Sei **node-side** signal,
+  not emitted by sei-load — on Sei this typically surfaces as `sei_occ_*`, but confirm
+  the exact series name exists on your SUT before relying on it; the node version under
+  test may not export it at all. File `/issue` if that signal isn't exposed where you
+  can query it).
+- `seiload_block_time_seconds` widening while `seiload_gas_used` holds steady ⇒ execution
+  is the bottleneck, not block fullness — a contention signature.
+
+**Next move:** binary-search θ for the knee where throughput drops sharply; that θ is the
+contention threshold for this `recordCount`.
+
+---
+
+## 3. Recipe: probe tx-size scaling
+
+**Goal:** how does per-tx size/gas affect block packing and latency.
+
+**Design:** sweep the size distribution / `sizeBuckets`. Hold seed and λ fixed.
+
+**Read & interpret:**
+- `seiload_gas_used` histogram (per block) — does the SUT hit the gas ceiling
+  (`--target-gas`, default 10M)? `histogram_quantile(0.99, …seiload_gas_used_bucket…)` near
+  the ceiling ⇒ blocks are gas-bound.
+- `seiload_send_latency_seconds` and `seiload_inclusion_latency_seconds` — larger txs raise
+  both if execution/propagation cost scales with size.
+- `seiload_block_time_seconds` — rising with size ⇒ production cost is size-sensitive.
+
+**Next move:** if blocks are gas-bound before λ saturates, you are measuring block-packing,
+not throughput — lower per-tx gas or raise `--target-gas` to isolate the variable.
+
+---
+
+## 4. Recipe: measure trustworthy tail latency
+
+**Goal:** a defensible p99 time-to-chain.
+
+**Design (load-bearing):**
+- `--arrival-model open_loop` — **mandatory**. Closed-loop suffers coordinated omission
+  and `seiload_inclusion_latency_seconds` is not even recorded there (see [06](06-measurement-metrics.md) §3.2).
+- Fixed λ (`--tps`) **at or above** suspected capacity — you want the schedule to expose
+  the slowdown, not avoid it.
+- `--track-receipts` enabled (inclusion histogram requires it).
+- Fixed seed.
+- Size `--max-in-flight` and `--inclusion-reap-after` so healthy txs aren't reaped:
+  registry cap auto-sizes from `TPS × reapAfter × 1.5`, but verify `dropped_at_cap == 0`.
+
+**Read:**
+```promql
+histogram_quantile(0.99, sum by (le) (rate(seiload_inclusion_latency_seconds_bucket[1m])))
+```
+
+**Interpret / validity gate:** the p99 is only trustworthy if the run wasn't
+generator-bound (§5). Confirm `seiload_run_tps_final_per_second ≈ --tps`,
+`dropped_at_cap == 0`, and `seiload_block_gaps_total == 0 && seiload_block_fetch_errors_total == 0`
+(else `included` is an undercount biasing the tail). If those hold, quote the p99;
+otherwise rerun.
+
+---
+
+## 5. Ensuring a run is VALID / comparable
+
+Run this checklist **before** trusting any number. A failing item invalidates the run.
+
+| Check | Query / signal | Pass condition | If it fails |
+|-------|----------------|----------------|-------------|
+| Fixed seed | config | identical seed across A/B | reseed; reruns aren't comparable |
+| Open-loop for latency | `--arrival-model` | `open_loop` | closed-loop → coordinated omission; rerun |
+| Generator kept up | `seiload_run_tps_final_per_second` vs `--tps` | within tolerance | under-loaded; raise workers/λ headroom |
+| Drops are real shedding | `seiload_run_txs_dropped_total` | tracks SUT saturation, not generator stalls | suspect generator/`maxInFlight`; see [06](06-measurement-metrics.md) §3.4 |
+| No registry starvation | `seiload_inclusion_outcome_total{outcome="dropped_at_cap"}` | `== 0` | raise `--inclusion-reap-after` / cap; inclusion undercounted |
+| No observer loss | `seiload_block_gaps_total`, `seiload_block_fetch_errors_total` | `== 0` | `included` undercounts; treat inclusion rate as a lower bound |
+| Sends not erroring en masse | `seiload_run_txs_failed_total`, `seiload_txs_rejected_total` | low / explained | investigate SUT/client rejection before reading throughput |
+| Conservation balances | run-summary log + queries | `registered == included + expired + inflight_at_shutdown` | accounting broken; do not trust derived rates |
+| Clean shutdown | drain vs SIGTERM/`--duration` | clean drain preferred for exact accounting | note the shutdown-boundary undercount in your report |
+
+`schedule_lag` is the ideal generator-validity gate but is a **concept, not an emitted
+metric on main** (emitter punted as PLT-463) — you cannot query it. Compute validity from
+the heuristics above and state the assumption. See
+[06-measurement-metrics](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag) §3.4.
+
+For fair A/B methodology see [05-reproducibility](05-reproducibility.md); for failure
+modes to rule out (what a bad number *isn't*) see [08-limits-boundaries](08-limits-boundaries.md).
+
+---
+
+## 6. Compact run → check → mean → move loop
+
+A drop-in autonomous sequence for a single run:
+
+| Run output | Metric to check | What it means | Next move |
+|------------|-----------------|---------------|-----------|
+| Run started | `seiload_tps_achieved_per_second`, `seiload_worker_queue_length` | Is the generator hitting λ? | Queue rising + TPS < λ ⇒ add `--workers` |
+| Mid-run | `seiload_block_time_seconds`, `seiload_gas_used` p99 | Is the SUT block-bound or gas-bound? | Gas-bound ⇒ adjust tx size / `--target-gas` |
+| Mid-run | `seiload_run_txs_dropped_total` climbing | In-flight saturating | Near/above capacity — good for tail latency; bad if you wanted under-capacity |
+| End | run-summary log line | Conservation balances? | If not, discard run |
+| End | inclusion rate (§3.1 of [06](06-measurement-metrics.md)) | Fraction reaching chain | < target ⇒ SUT shedding; investigate expired vs dropped_at_cap |
+| End | `seiload_inclusion_latency_seconds` p99 | Tail time-to-chain | Validity-gate it (§5), then record with seed + config |
+| End | `seiload_block_gaps_total`/`seiload_block_fetch_errors_total` | Observer integrity | Nonzero ⇒ inclusion is a lower bound; note it |
+
+**When a needed signal doesn't exist** (e.g. `schedule_lag`, SUT block-stm aborts where
+you can query them), do not paper over it: file an `/issue` naming the exact query you
+were trying to write and why, so the gap gets closed rather than guessed around.
+
+---
+
+## See also
+
+- [01-mental-model](01-mental-model.md) — what sei-load is and isn't.
+- [04-workload-model](04-workload-model.md) — arrival model, scenarios, the Sei contention mechanism.
+- [05-reproducibility](05-reproducibility.md) — fixed seed, open vs closed loop, fair A/B.
+- [06-measurement-metrics](06-measurement-metrics.md) — the metric catalog and PromQL.
+- [08-limits-boundaries](08-limits-boundaries.md) — what to rule out before trusting a result.
diff --git a/docs/08-limits-boundaries.md b/docs/08-limits-boundaries.md
new file mode 100644
index 0000000..99101ee
--- /dev/null
+++ b/docs/08-limits-boundaries.md
@@ -0,0 +1,87 @@
+# Limits & Accepted Boundaries
+
+> [← AGENTS.md index](../AGENTS.md)
+
+> What this covers: the known, accepted measurement boundaries in sei-load's send and inclusion paths — what each is, when it bites, why it's accepted, and the counter to check before trusting a run. When an agent needs it: interpreting results, especially deciding whether a non-zero counter invalidates a conclusion or is benign.
+
+Every boundary below is **accepted by design** and bounded. The contract is conservative: where the tooling can be wrong, it is wrong in a known direction (almost always *undercounting* inclusions, never inventing them). An agent reading a run should treat a non-zero boundary counter as a *confidence discount in a known direction*, not as silent corruption. Grounded in `sender/doc.go`.
+
+> **Metric names here are conceptual.** Names like `block_gaps`, `dropped_at_cap`, `dropped`, `failed` are the *concepts* to check; the exact queryable series carry the `seiload_` prefix + Prometheus suffixes (e.g. `seiload_block_gaps_total`, `seiload_inclusion_outcome_total{outcome="dropped_at_cap"}`). See [Measurement & Metrics §2](06-measurement-metrics.md) for the authoritative catalog before writing a query.
+
+## Send path
+
+### Open-loop shutdown boundary
+
+- **What:** On a clean drain (generator exhaustion), `admitted == succeeded + failed` holds exactly. On `ctx` cancel (SIGTERM or `--duration` expiry), txs already admitted and buffered for a worker can exit **uncounted** (`sender/doc.go:72-75`).
+- **When it bites:** Only on cancellation-terminated runs — duration-bounded or interrupted. Never on a run that ends because the workload drained.
+- **Why accepted:** The undercount is bounded by the worker channel backlog (a small fixed buffer), and the conservation identity is exact on clean completion.
+- **How to interpret:** If the run ended by duration/SIGTERM and `admitted ≠ succeeded + failed`, the gap is shutdown buffer, not lost load — bounded by backlog. For exact conservation, end runs by generator drain (finite workload) rather than by duration. Check the `dropped` and `failed` gauges in the run summary.
+
+Related send-path lenses (not boundaries):
+- `schedule_lag` (`AttemptedSendTime − IntendedSendTime`) — ⚠️ **a concept, NOT an emitted metric on main**: there is no `schedule_lag` series to query (emitter punted as PLT-463); judge it externally via the [06 §3.4](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag) heuristics. Conceptually it is the primary coordinated-omission gate: non-zero/growing lag means sends are falling behind the open-loop arrival schedule *before* any tx is shed, and latency conclusions are suspect once it's large (`sender/doc.go:119-124`).
+- `dropped` — genuine load shed once `maxInFlight` saturates (drop-and-count). This is real backpressure, not buffer geometry (`sender/doc.go:36-48`).
+- `failed` — sends that returned a non-nil error; counted, never lost (`sender/doc.go:62-70`).
+
+Conservation to assert per run: `scheduled == dropped + admitted` and `admitted == succeeded + failed` (the latter exact only on clean drain).
+
+## Inclusion tracking (`--track-receipts`)
+
+Inclusion is observed block-by-block by the `InclusionTracker`, not by per-tx receipt polling: it subscribes to new heads, fetches each arriving block body **once**, and stamps `InclusionTime` on matched in-flight txs (`sender/doc.go:88-97`). Conservation: `registered == included + expired + inflight_at_shutdown`, and `registered ⊆ succeeded` — only successful sends are registered, and the inclusion denominator is `succeeded` (`txs_accepted`), never a minted series (`sender/doc.go:99-103`).
+
+The six accepted boundaries (`sender/doc.go:105-117`):
+
+### 1. WebSocket head gaps
+
+- **What:** A missed new-head subscription event is counted (`block_gaps`) but **never backfilled**. Txs in the missed block are not matched and eventually reap as `expired`.
+- **When it bites:** Flaky WS connection, or head-arrival faster than the subscriber drains.
+- **Why accepted:** Degrades conservatively — an *undercount of inclusions*, never a miscount.
+- **Interpret:** Non-zero `block_gaps` ⇒ reported inclusion rate is a **lower bound**; true inclusion is ≥ reported. Don't read an inclusion shortfall as chain-side drops without first checking `block_gaps`.
+
+### 2. Reorg first-observation-wins
+
+- **What:** On a reorg the tracker uses first-observation-wins (stamp `InclusionTime` + delete from in-flight); there is no canonical-chain reconciliation.
+- **When it bites:** Chain reorgs during the run.
+- **Why accepted:** Inclusion-time error is bounded by `reorg_depth × block_time`.
+- **Interpret:** If the SUT reorged, inclusion-latency samples carry up to `reorg_depth × block_time` of error. On a stable chain this is zero. Treat inclusion *latency* (not the count) as the affected metric.
+
+### 3. Single fetch endpoint
+
+- **What:** Block bodies are fetched from one endpoint only — `Endpoints[0]`, shared with the block collector.
+- **When it bites:** Always present; it adds a small read load to that one node and ties inclusion observation to that node's view.
+- **Why accepted:** Small added load; single consistent view.
+- **Interpret:** `Endpoints[0]` is the inclusion oracle. If you multi-target sends across endpoints, inclusion is still judged from `Endpoints[0]`'s chain view. Note that contract scenarios also deploy/bind against `Endpoints[0]` (`generator/scenarios/*.go` `Attach`).
+
+### 4. Header-arrival clock
+
+- **What:** `InclusionTime` is the **header-arrival wall-clock** at the tracker — not fetch-completion time, and not `header.Time` (the block's own timestamp).
+- **When it bites:** Always; it's the definition of the inclusion timestamp.
+- **Why accepted:** It's the measurable instant closest to "the tracker learned this block exists."
+- **Interpret:** `inclusion_latency = InclusionTime − IntendedSendTime` includes network propagation to the tracker. It is **open-loop-only**: in closed-loop, `IntendedSendTime` is enqueue time, so the latency sample is omitted (counts still tracked) (`sender/doc.go:94-97`). Do not compare inclusion-latency across arrival models, and do not equate it with on-chain block timestamp deltas.
+
+### 5. Failed block fetch
+
+- **What:** A failed block-body fetch is counted (`block_fetch_errors`) and **not retried**; that block's txs reap as `expired`.
+- **When it bites:** Transient RPC errors fetching a body from `Endpoints[0]`.
+- **Why accepted:** Same conservative undercount as a WS gap (boundary 1).
+- **Interpret:** Non-zero `block_fetch_errors` ⇒ inclusion is again a lower bound. Sum it with `block_gaps` when judging how much of an inclusion shortfall is observational vs. real.
+
+### 6. Late register / dropped-at-cap
+
+- **What (late register):** A tx registered *after* its including block was already scanned is missed and reaps as `expired` — bounded by the microsecond register window vs. block time (a rare conservative undercount, same direction as a WS gap) (`sender/doc.go:115-117`).
+- **What (dropped-at-cap):** When the inclusion registry hits its cap, registrations are dropped and counted (`dropped_at_cap`); these txs are **excluded from the inclusion denominator** (`sender/doc.go:101-103`).
+- **When it bites:** Late-register is rare (register window ≪ block time). `dropped_at_cap` bites under sustained inclusion backlog (registry can't keep up).
+- **Why accepted:** Late-register undercount is microsecond-window-bounded; cap-drops are excluded from the denominator so they can't inflate or deflate the inclusion rate.
+- **Interpret:** Non-zero `dropped_at_cap` ⇒ the inclusion rate is computed over fewer txs than `succeeded`; it's still correct *for the registered subset* but doesn't cover the whole run. If `dropped_at_cap` is large, raise the registry cap or lower the rate before trusting inclusion as run-wide.
+
+### Inclusion summary
+
+- `inclusion_latency` is **open-loop-only** (omitted, not zero, in closed-loop).
+- `inflight_at_shutdown` is read only after both workers and tracker have joined (`sender/doc.go:103`), so it is a true terminal residual, not a race artifact.
+- Master identity to assert: `registered == included + expired + inflight_at_shutdown`, with `registered ⊆ succeeded`.
+- **Direction rule:** boundaries 1, 5, 6(late) all push inclusion *down*. If your run shows fewer inclusions than expected, check `block_gaps + block_fetch_errors + dropped_at_cap` **first** — that sum caps how much of the shortfall is observational before you attribute any of it to the SUT.
+
+## See also
+
+- [03-config-reference](03-config-reference.md) — `--track-receipts`, endpoints, registry cap settings.
+- [06-measurement-metrics](06-measurement-metrics.md) — the counter series named above.
+- [07-experiment-playbook](07-experiment-playbook.md) — how to design runs that keep these boundaries at zero.