Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# sei-load — agent guide

sei-load drives synthetic transaction load at a Sei EVM endpoint and **emits measurements** (counters, histograms, a run summary) about how the system under test (SUT) responds. It is a load generator and a measurement instrument — **not** a judge: it computes no pass/fail verdicts, percentiles, or SLO compliance; you derive those externally from the signals it emits. These docs are the operating manual for an agent that designs, runs, and interprets sei-load experiments and acts on the results.

## Start here (reading order for a new agent)

Read linearly the first time:

1. [docs/01-mental-model.md](docs/01-mental-model.md) — the pipeline, open- vs closed-loop, coordinated omission, and the measure-not-judge philosophy. **Read this first.**
2. [docs/02-running.md](docs/02-running.md) — build the binary, every CLI flag, the run lifecycle.
3. [docs/03-config-reference.md](docs/03-config-reference.md) — the JSON config schema behind those flags.
4. [docs/04-workload-model.md](docs/04-workload-model.md) — scenarios and the StorageRW contention/size/op axes.
5. [docs/06-measurement-metrics.md](docs/06-measurement-metrics.md) — the authoritative metric catalog and the conservation model; the PromQL you query.
6. [docs/05-reproducibility.md](docs/05-reproducibility.md) — seed → sub-stream determinism and fair A/B setup. **Read before 07:** the playbook's cardinal rule depends on the seed/fair-A/B mechanics defined here.
7. [docs/07-experiment-playbook.md](docs/07-experiment-playbook.md) — objective → knobs → read → interpret recipes.

Keep [docs/08-limits-boundaries.md](docs/08-limits-boundaries.md) as a reference — pull it in when a non-zero boundary counter forces you to discount a result.

## Table of contents

| Doc | Covers / when you need it |
|-----|---------------------------|
| [01-mental-model.md](docs/01-mental-model.md) | The send pipeline, open-loop vs closed-loop arrival, coordinated omission, conservation identities, and why the tool emits signal not verdicts. The conceptual floor — read before anything else. |
| [02-running.md](docs/02-running.md) | Building/invoking `seiload`, every CLI flag, settings precedence, the metrics endpoint, copy-pasteable invocations, and the run lifecycle. Need it when starting/stopping/reproducing a run. |
| [03-config-reference.md](docs/03-config-reference.md) | The complete JSON config schema — `LoadConfig`, `settings`, `scenarios`, `accounts`, `funding`, gotchas. Need it when authoring or editing a config. |
| [04-workload-model.md](docs/04-workload-model.md) | The scenario set, what each stresses, and the StorageRW key-contention / tx-size / op-mix axes plus what they probe on Sei's parallel executor. Need it when choosing a scenario and shaping load. |
| [05-reproducibility.md](docs/05-reproducibility.md) | Seed → sub-stream derivation, the exact determinism guarantee, fair A/B setup, open-loop determinism under drops. Need it before comparing two runs. |
| [06-measurement-metrics.md](docs/06-measurement-metrics.md) | The authoritative 19-instrument catalog, the conservation model, and the PromQL recipes for rates/percentiles/goodput/validity. Need it before writing any query or trusting a number. |
| [07-experiment-playbook.md](docs/07-experiment-playbook.md) | The reasoning layer: objective → knobs → validity → read → interpret → next move, with recipes for contention, size, and tail-latency experiments. Need it when designing a run. |
| [08-limits-boundaries.md](docs/08-limits-boundaries.md) | The accepted measurement boundaries (WS gaps, reorgs, single fetch endpoint, header-arrival clock, cap drops) and the counter to check for each. Need it when deciding whether a non-zero counter invalidates a conclusion. |

## Fastest path to a first experiment

1. Build and validate offline: `make build`, then a `--dry-run` invocation — see [docs/02-running.md](docs/02-running.md#common-invocations).
2. Run an open-loop, fixed-λ measurement with receipt tracking and follow the trustworthy-tail-latency recipe — see [docs/07-experiment-playbook.md](docs/07-experiment-playbook.md) §4, then validity-gate it with §5 before quoting any number.

## Standing caveats (true on `main` today)

- **StorageRW distribution/size/op axes require PLT-465 (#54, unmerged).** `keyDistribution`, `sizeDistribution`, `sizeBuckets`, `recordCount`, and `operations` parse but **do not affect generated transactions** on main — StorageRW emits a fixed scaffold (slot 0, empty pad, all-`rmw`). See [docs/04-workload-model.md](docs/04-workload-model.md).
- **`schedule_lag` is a concept, not a queryable metric** (emitter punted as PLT-463). Judge generator validity externally via the [06 §3.4](docs/06-measurement-metrics.md) heuristics.
- **`--report-path` writes a formatted text dump, not JSON** (schema-versioned JSON is PLT-467). The seed is **config-file-only** (no `--seed` flag).
- **Exported series carry a `seiload_` prefix and unit suffixes.** The Prometheus exporter sets `WithNamespace("seiload")` (configurable), so every series is prefixed `seiload_`, and OTel appends unit suffixes (`s`-unit → `_seconds`, etc.); histograms expose `_bucket`/`_sum`/`_count` and counters end `_total`. The wire names — not the instrument base names — are what you query; [docs/06](docs/06-measurement-metrics.md) §2 lists them.
- **The tool emits signal, not verdicts.** Every rate, percentile, and pass/fail is computed by you.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
> 🤖 For agent-driven experiments, start at [AGENTS.md](AGENTS.md) and docs/ — the authoritative, current operating docs. This README is a human quick-start and may lag.

# sei-load
[![Tests](https://github.com/sei-protocol/sei-load/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/sei-protocol/sei-load/actions/workflows/build-and-test.yml)

Expand Down
178 changes: 178 additions & 0 deletions docs/01-mental-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Mental Model

> [← AGENTS.md index](../AGENTS.md)

> What this covers / when an agent needs it. The conceptual foundation you must
> hold before designing, running, or interpreting a sei-load experiment: the send
> pipeline, the open-loop arrival model and the coordinated-omission problem it
> solves, where verdicts come from (not the tool), and the load-bearing
> vocabulary. Read this first; the config and metric specifics are in the sibling
> docs linked at the end.

## What sei-load is

sei-load drives synthetic transaction load at a Sei EVM endpoint and **emits
measurements** about how the system under test (SUT) responds. It is a load
generator and a measurement instrument. It is **not** a judge: it does not
compute pass/fail verdicts or SLO compliance (see [Measurement philosophy](#measurement-philosophy)).

## The send pipeline

A transaction flows through a fixed pipeline:

```
generator → dispatcher → sharded sender → per-endpoint workers → Sei RPC
```

- **Generator** (`generator.Generator`) produces `*types.LoadTx` values. Each
`Generate()` call draws from the seeded PRNG sub-streams (accounts, gas, key/
size distributions) — this is the only place workload randomness is consumed.
- **Dispatcher** (`sender.Dispatcher`) owns the arrival timing. It runs in one of
two arrival models (below) and hands each tx to the sender.
- **Sharded sender** (`sender.ShardedSender`, satisfies `sender.TxSender`) routes
each tx to one of N per-endpoint workers by shard. `Send` enqueues into the
worker's channel and returns immediately — it is asynchronous.
- **Workers** (`sender.Worker`) each own one RPC client to one endpoint and run
`Tasks` send goroutines over a shared channel. The send goroutine stamps
`AttemptedSendTime`, then calls go-ethereum `eth_sendRawTransaction`
**synchronously**.
- **Sei RPC** is the SUT. The send returns nil (accepted) or an error (rejected).

A single shared `golang.org/x/time/rate.Limiter` is the one rate authority for
the whole pipeline. In closed-loop the worker gates on it; in open-loop the
scheduler reads it as a clock source (see below). When ramping is enabled, a
`Ramper` drives the limiter's limit up or down via `SetLimit`.

Optionally, when `--track-receipts` is set, successful sends are handed to a
block-indexed `stats.InclusionTracker` that observes on-chain inclusion by
scanning arriving blocks (O(blocks), not per-tx receipt polling). See
[06-measurement-metrics.md](06-measurement-metrics.md).

## The arrival model: why open-loop exists

The dispatcher supports two arrival models, selected by `arrivalModel`
(`sender.ArrivalModel`, values `"closed_loop"` / `"open_loop"`).

### Coordinated omission (the problem)

In the legacy **closed-loop** model the dispatcher generates the next tx only
once a sender is free (`runClosedLoop`: generate-then-send in lockstep). The
dequeue clock is therefore the SUT's clock: **when the SUT slows, the generator
slows with it and simply stops issuing the requests that would have observed the
slowdown.** The latency histogram under-reports, because the worst-affected
requests were never sent. This is **coordinated omission** — the closed-loop
model lies about latency precisely when the answer matters most (under stress).

### Open-loop (the fix)

The **open-loop** model decouples the arrival clock from sender availability
(`sender.openLoopScheduler`). Transaction `i` is scheduled at a fixed instant
**`t₀ + i/λ`**, where `t₀` is the run start and `λ` is the target rate, regardless
of whether any sender is free.

Properties that make it honest:

- **Absolute-instant scheduling.** The scheduler sleeps until each absolute
instant (`SleepUntil(nextSend)`), not for a relative gap, so per-tx scheduling
slop cannot accumulate into clock drift over a long run.
- **λ as a clock, not a gate.** λ is sampled from the shared limiter on each step
(`limiter.Limit()`), so a ramping rate is honored; at fixed λ the running sum
telescopes to exactly `t₀ + i/λ`. The limiter is read here as a clock source —
the schedule advances whether or not the SUT keeps up.
- **Bounded in-flight + drop-and-count.** The arrival clock is **never throttled
by backpressure** (throttling would reintroduce coordinated omission). Instead
a counting semaphore bounds true in-flight sends to `maxInFlight`. At each
scheduled instant the scheduler does a non-blocking `TryAcquire`: if senders are
saturated the tick is **dropped and counted** and the clock moves on. The permit
is held across the full unacked-in-flight window (enqueue + RPC round-trip) and
released only after the synchronous send returns (via `tx.OnComplete`), so
`maxInFlight` bounds real in-flight work and the drop count measures genuine
load shed, not buffer geometry.
- **Admit before generate.** The permit is acquired **before** the generator is
drawn. A dropped tick draws no tx (no seeded-stream consumption, no signer CPU),
which makes admitted txs a deterministic prefix of the seeded sequence — see
[05-reproducibility.md](05-reproducibility.md).

Closed-loop is retained only as the **legacy regression baseline**. For any
experiment where tail latency under load matters, use open-loop.

To use open-loop: set `arrivalModel: "open_loop"` and a finite positive rate
(`tps > 0` or `rampUp: true`); validation rejects open-loop with no finite λ.
See [03-config-reference.md](03-config-reference.md).

### Conservation (how counts must add up)

Every scheduled tick reaches exactly one terminal state, and the dispatcher folds
these into the run summary:

```
scheduled = dropped + admitted
admitted = succeeded + failed
```

- **dropped** — shed because in-flight was saturated at the scheduled instant
(never admitted, never sent).
- **admitted** — took a permit and drew a tx.
- **succeeded** — admitted, send returned nil (`DispatcherStats.TotalSent`).
- **failed** — admitted, send returned an error. **Counted, never lost**
(`DispatcherStats.Failed`); a send error does not tear down the run.

In closed-loop, `Failed` and `Dropped` are always 0.

A finite workload ends when the generator drains; the terminal probe that
discovers this advances neither clock, index, nor counters. On a clean drain
`admitted == succeeded + failed` holds exactly. On `ctx` cancel (SIGTERM /
duration limit) some admitted txs may still be buffered for a worker and exit
uncounted — a bounded undercount that never affects a cleanly completed run.

## Measurement philosophy

**The generator emits measurements; it does not pronounce verdicts.** SLO
judgments, A/B comparisons, and pass/fail decisions are computed **externally**
via metric queries against the telemetry the tool emits — they are not owned by
sei-load. This shapes how you consume outputs:

- Treat sei-load output as raw signal (counters, histograms, the run summary),
not as a graded result.
- Build your verdict logic in your query/analysis layer, gating on the run-level
arrival model (see next point).
- **A tx cannot self-describe which model produced it.** An open-loop and a
closed-loop `LoadTx` are byte-identical; coordinated-omission safety is a
property of the run's arrival model, not of any per-tx field. Latency and
schedule-lag consumers **must gate on the run-level `arrivalModel`** before
trusting a latency or schedule-lag sample. In closed-loop, `IntendedSendTime`
is merely the back-pressured enqueue time, so derived latency is omitted /
meaningless.

> **`schedule_lag` is a concept, not a metric on main today.** It is the
> coordinated-omission/validity quantity `AttemptedSendTime − IntendedSendTime`,
> computed and judged **externally** — there is no `schedule_lag` series on
> `/metrics` (the emitter was punted as PLT-463). Do not write a query against it;
> see [06-measurement-metrics.md](06-measurement-metrics.md#34-detecting-a-generator-bound-invalid-run--schedule_lag)
> for the external validity heuristics that stand in for it.

## Glossary

| Term | Meaning |
|---|---|
| **λ (lambda)** | Target arrival rate (tx/s). In open-loop, sampled from the shared limiter each step as a clock source; the inter-arrival gap is `1/λ`. |
| **t₀** | Run start instant; the anchor for the open-loop schedule. |
| **intended send time** | `IntendedSendTime` = `t₀ + i/λ`, the true scheduled instant (open-loop). In closed-loop it is the enqueue time instead — not a real schedule. |
| **attempted send time** | `AttemptedSendTime`, the wall clock when a worker actually called the RPC. |
| **inclusion time** | `InclusionTime`, the header-arrival wall clock of the block that included the tx (set only when `--track-receipts`). |
| **schedule_lag** | `AttemptedSendTime − IntendedSendTime`. The primary coordinated-omission gate: it shows sends falling behind the arrival schedule even before any tx is shed. Open-loop only. **A concept, not a metric on main** — computed/judged externally; not a queryable series (emitter punted as PLT-463). |
| **SequenceIndex** | The arrival-tick index `i`. Monotonic; under drops it is non-contiguous across admitted txs (dropped ticks advance `i` and the clock but consume no draw). |
| **admitted** | A tick that took an in-flight permit and drew a tx. |
| **dropped** | A tick shed because in-flight was saturated (drop-and-count). |
| **failed** | An admitted tx whose send returned an error (counted, not lost). |
| **in-flight** | Concurrent unacked sends, bounded by `maxInFlight` via the semaphore; a permit is held enqueue → RPC return. |
| **drop-and-count** | The open-loop overload policy: shed and tally overdue ticks rather than throttle the arrival clock. |

## See also

- [02-running.md](02-running.md) — invoking a run.
- [03-config-reference.md](03-config-reference.md) — every config/CLI setting.
- [04-workload-model.md](04-workload-model.md) — scenarios, distributions, accounts.
- [05-reproducibility.md](05-reproducibility.md) — seeds, sub-streams, A/B.
- [06-measurement-metrics.md](06-measurement-metrics.md) — emitted metrics and the run summary.
- [07-experiment-playbook.md](07-experiment-playbook.md) — recipes for common experiments.
Loading
Loading