Phase 6: validation & benchmark matrix#11
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Phase 6 (validation & release). Add the comparative JMH matrix in the raw-JMH `benchmarks-jvm` tier — our `LongLongMap`/`IntIntMap` versus boxed `HashMap` and the four specialist primitive-map libraries (fastutil, HPPC, Eclipse Collections, Agrona) — over lookup/insert/churn across sizes and dense/adversarial key sets, plus a load-factor sweep, a JOL retained-footprint report, a reproducibility runner (env metadata + fixed seeds + JSON), and the benchmarking / release-runbook docs. Measured: our maps retain 19/10 bytes per entry — 4.68x/7.30x less than boxed `HashMap` and ~1.9x/1.8x less than the specialist libraries — and stay flat from dense to adversarial keys where the competitors degrade ~5-6x. Absolute time numbers are deferred to a pinned-hardware run; Maven Central publication is documented as a human-gated runbook (irreversible; needs signing keys, a Portal token, and a non-SNAPSHOT version). The competitor libraries are benchmark-only and JVM-only, confined to the `jmh` source set and never a dependency of the published `elastic` module. A `@Setup` correctness gate rejects any miswired adapter before a number is taken. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerated for the version bump and the Phase 6 benchmark-only competitor dependencies (fastutil, HPPC, Eclipse Collections, Agrona, JOL) on the `benchmarks-jvm` `jmh` source set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11 +/- ##
=========================================
Coverage 96.98% 96.98%
Complexity 431 431
=========================================
Files 22 22
Lines 1857 1857
Branches 285 285
=========================================
Hits 1801 1801
Misses 26 26
Partials 30 30 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds the Phase 6 validation + release deliverables for elastic-hashing: a comparative JVM benchmark matrix (including major primitive-map competitors), a deterministic JOL-based footprint report, a reproducibility harness that bundles environment metadata + results, and accompanying benchmarking/publishing documentation. Also bumps the snapshot version and regenerates dependency reports to reflect the added benchmark-only dependencies.
Changes:
- Introduce a JMH comparative matrix (
LongLongMatrixBenchmark,IntIntMatrixBenchmark,LoadFactorBenchmark) with monomorphic adapters for stdlib + competitor primitive-map libraries. - Add a deterministic retained-heap footprint report (
FootprintReport.kt) plus a reproducibility runner (capture-env.sh,run-matrix.sh) and commit an indicative results bundle. - Add/refresh documentation for benchmarking methodology and the (human-gated) publishing/release runbook; bump snapshot version and dependency reports.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| version.gradle.kts | Bumps versionToPublish to 1.0.0-SNAPSHOT-011. |
| README.md | Updates Phase 6 status and summarizes benchmark/footprint findings with doc links. |
| docs/publishing.md | Adds a human-run publishing & release runbook (Maven Central + KMP specifics). |
| docs/project.md | Updates project/module overview with Phase 6 benchmarking + footprint report context. |
| docs/performance-goals.md | Adds Phase 6 validation matrix goals/results and reproducibility notes. |
| docs/dependencies/pom.xml | Updates version and adds benchmark-related dependency entries to the doc POM. |
| docs/dependencies/dependencies.md | Regenerates dependency/license report for the new snapshot and added deps. |
| docs/benchmarking.md | Adds detailed benchmarking methodology + reproducibility instructions. |
| benchmarks-jvm/build.gradle.kts | Adds benchmark-only competitor deps + JOL, and registers footprintReport. |
| benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/CompetitorAdapters.kt | Adds monomorphic adapters and per-impl sizing rules for the comparative matrix. |
| benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/MatrixKeys.kt | Adds shared dense/clustered key generation + deterministic shuffles. |
| benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/LongLongMatrixBenchmark.kt | Adds Long→Long comparative JMH benchmarks (hit/miss/insert/churn). |
| benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/IntIntMatrixBenchmark.kt | Adds Int→Int comparative JMH benchmarks (hit/miss/insert/churn). |
| benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/LoadFactorBenchmark.kt | Adds load-factor sweep benchmark for tunable competitors. |
| benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt | Adds deterministic JOL retained-heap footprint report generator (md/tsv). |
| benchmarks-jvm/capture-env.sh | Adds environment capture script for reproducibility bundles (JDK/OS/CPU/etc). |
| benchmarks-jvm/run-matrix.sh | Adds one-command runner to produce self-describing benchmark bundles. |
| benchmarks-jvm/results/README.md | Documents the layout and intent of committed results bundles. |
| benchmarks-jvm/results/m4max-indicative/environment.txt | Captured environment metadata for the indicative run. |
| benchmarks-jvm/results/m4max-indicative/footprint.md | Committed footprint report output for the indicative run. |
| benchmarks-jvm/results/m4max-indicative/footprint.tsv | Committed footprint report TSV output for the indicative run. |
| benchmarks-jvm/results/m4max-indicative/lookup-hit-1m.md | Committed indicative lookup-hit slice summary at 1M entries. |
| .agents/tasks/phase-6-validation-release.md | Phase 6 task doc update (not reviewed here per org policy for .agents/**). |
… from the LF sweep - `@Threads(1)` on all three benchmarks — `churn` mutates the shared `Scope.Benchmark` map, so a CLI `-t` override would be a data race with invalid results. - Correctness gate in `LoadFactorBenchmark`'s `@Setup` (hit-sum == key-sum), matching the matrix benchmarks, so a miswired adapter fails fast. - Drop Agrona from the load-factor sweep: its load factor is capped at 0.9 (the constructor rejects 0.99), so it cannot reach the >=0.95 points this benchmark targets — a pre-merge smoke run caught the exception. Its lookup at its own load stays in the main matrix. - `capture-env.sh`: on Linux, fall back to `/proc/cpuinfo` when `lscpu` has no "Model name:" line (an empty `cpu_model` otherwise slipped through), and always emit `os_product`. - `FootprintReport`: reword "equal occupancy" to "same entry count, each map pre-sized in its own units" (loads differ by each map's policy; capacity slack is included); regenerate the committed `footprint.md`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b9849b366
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
`FootprintReport` floored bytes-per-entry with integer division before computing the compactness ratios, so the report — and the docs citing it as exact JOL results — overstated the memory win: it printed `89 / 19 = 4.68x` where the true `LongLongMap`-vs-`HashMap` retained-total ratio is 4.59x. Keep bytes-per-entry as an exact `Double` and compute every ratio from `totalBytes`. Corrected numbers propagated to README, benchmarking.md, performance-goals.md, project.md, the phase-6 task doc, and the committed footprint bundle: `LongLongMap` 19.4 B/entry (4.59x vs `HashMap` 89.1); `IntIntMap` 10.3 (7.11x vs 73.1); competitors 36.6 / 18.3 (2.44x / 4.00x vs `HashMap`). The ours-vs-competitor factor (~1.9x / ~1.8x) is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Footprint report + docs: the displayed bytes-per-entry are rounded to 0.1, only the retained totals (and the ratios derived from them) are exact. Reword the report header and `benchmarking.md` so "exact" attaches to the totals, not the rounded per-entry display. - `publishing.md`: Maven Central is not a Spine-convention destination — `PublishingRepos` defines only Cloud Artifact Registry and GitHub Packages — so there is no built-in `...ToMavenCentralRepository` task. Document wiring a Central destination as an explicit prerequisite and note the task name follows the configured repository/plugin rather than hard-coding one. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… in #12 - `run-matrix.sh`: an optional core-list argument runs the single-threaded comparison matrix (LongLong/IntInt/LoadFactor) as the JMH jar directly under `taskset`, so no Gradle daemon shares the isolated cores; the pinned cores are recorded in `environment.txt`. The multi-threaded read-scaling / mixed-load benchmarks stay excluded from the pinned run (they need all cores). - `docs/benchmarking.md`: the Linux governor / turbo / isolcpus prep recipe. - Phase 6 task doc: reference issue #12 — the step-by-step runbook for configuring a pinned Linux box and handing the results back to Claude Code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| for (key in keys) { | ||
| map.put(key, key) | ||
| } | ||
| check(map.size() == n) { "Expected $n entries in `$impl`, got ${map.size()}." } | ||
| // Correctness gate: a benchmark on a miswired adapter is worthless. Each key's | ||
| // value is the key, so the sum of all hit lookups must equal the sum of keys — | ||
| // this catches any silent get/put defect before a single number is measured. | ||
| var hitSum = 0L | ||
| for (key in keys) { | ||
| hitSum += map.get(key) | ||
| } | ||
| check(hitSum == keys.sum()) { | ||
| "Adapter `$impl` returned wrong values: hit sum $hitSum != ${keys.sum()}." | ||
| } |
| for (key in keys) { | ||
| map.put(key, key) | ||
| } | ||
| check(map.size() == n) { "Expected $n entries in `$impl`, got ${map.size()}." } | ||
| // Correctness gate: a benchmark on a miswired adapter is worthless. Each key's | ||
| // value is the key, so the sum of all hit lookups must equal the sum of keys — | ||
| // this catches any silent get/put defect before a single number is measured. | ||
| // Widen to `Long` so a 1M-entry sum cannot overflow `Int`. | ||
| var hitSum = 0L | ||
| var expected = 0L | ||
| for (key in keys) { | ||
| hitSum += map.get(key).toLong() | ||
| expected += key.toLong() | ||
| } | ||
| check(hitSum == expected) { | ||
| "Adapter `$impl` returned wrong values: hit sum $hitSum != $expected." | ||
| } |
| val map = createAtMaxLoad(impl, fill) | ||
| for (key in keys) { | ||
| map.put(key, key) | ||
| } | ||
| check(map.size() == fill) { "Expected $fill entries in `$impl`, got ${map.size()}." } | ||
| // Correctness gate (mirrors the matrix benchmarks): each key's value is the key, | ||
| // so the sum of all hit lookups must equal the sum of keys — fail fast on any | ||
| // mis-constructed or miswired adapter rather than measuring wrong behavior. | ||
| var hitSum = 0L | ||
| for (key in keys) { | ||
| hitSum += map.get(key) | ||
| } | ||
| check(hitSum == keys.sum()) { | ||
| "Adapter `$impl` returned wrong values: hit sum $hitSum != ${keys.sum()}." | ||
| } | ||
| populated = map |
Phase 6 (validation & release) of the elastic-hashing implementation plan. Ships the comparative benchmark matrix, the memory-footprint report, and the reproducibility + release documentation. Publication itself is intentionally not executed — it is human-gated (see below).
What changed
benchmarks-jvm, the raw-JMH JVM tier): ourLongLongMap/IntIntMapvs boxedHashMapand four specialist primitive-map libraries — fastutil 8.5.18, HPPC 0.10.0, Eclipse Collections 13.0.0, Agrona 2.5.0 — behind monomorphic adapters (CompetitorAdapters.kt). Ops:lookupHit(random-access),lookupMiss,insertPresized,insertGrowing,churn; swept over size (10k/1M) and key distribution (DENSE/CLUSTERED). PlusLoadFactorBenchmark(0.5–0.99 sweep for the load-factor-tunable competitors). A@Setupcorrectness gate rejects any miswired adapter before a number is taken.FootprintReport.kt,./gradlew :benchmarks-jvm:footprintReport): exact retained heap via JOL, deterministic.capture-env.sh(JDK/OS/CPU + whether CPU frequency was pinned),run-matrix.sh(env + footprint + authoritative JMH into a self-describing bundle), fixed seeds, and a committedbenchmarks-jvm/results/m4max-indicative/bundle.docs/benchmarking.md(method + reproducibility),docs/publishing.md(release runbook), the Phase 6 task doc, and README / performance-goals / project updates.1.0.0-SNAPSHOT-011; dependency reports regenerated.Results (measured)
LongLongMapretains 19 B/entry (4.68× less thanHashMap, ~1.9× less than every primitive competitor, all 36);IntIntMap10 B/entry (7.30× vsHashMap, ~1.8× vs competitors, all 18). More compact than the specialist libraries too, from packing key + value + one control byte at 7/8 load.lookupHit,LongLongMapis flat from dense (9.9 ms) to adversarial/clustered (10.2 ms) keys, while fastutil degrades ~5× and Eclipse ~6×. Ourfmix64finalizer makes ours the fastest under adversarial keys — a dense-only benchmark would have inverted the conclusion, which is why the fairness gate mandates the clustered set.run-matrix.shrun; only the ratios/robustness that survive the unpinned-hardware caveat are cited.Deliberately deferred (human-gated)
docs/publishing.md;elasticcurrently registers no publish tasks, so wiringkmp-publishis the documented first release step.Positioning
The four competitors are a reference ceiling, not a success gate (fastutil/HPPC are co-fastest among classic open-addressing libraries). The committed baseline to beat is the standard library; our differentiators are memory, distribution robustness, and true Kotlin Multiplatform (the competitors are all JVM-only). The competitor libraries are benchmark-only and confined to the
jmhsource set — never a dependency of the published module.Testing & review
./gradlew build dokkaGenerategreen; thejmhbenchmark sources compile and all 11 benchmark methods generate and run; the footprint report and a JMH smoke slice produce JSON.kotlin-engineer,spine-code-review, andreview-docs— all APPROVE. A reviewer caught (and this PR fixes) a real fairness bug: Agrona's constructor takes raw slot capacity, not expected-entries, so it was silently rehashing mid-insertPresizedat 1M.Follow-ups (out of scope)
forEachon the primitive-value maps (the one op the matrix cannot fairly measure today).🤖 Generated with Claude Code