Phase 6: validation & benchmark matrix by alexander-yevsyukov · Pull Request #11 · SpineEventEngine/elastic

alexander-yevsyukov · 2026-07-04T20:27:46Z

Phase 6 (validation & release) of the elastic-hashing implementation plan. Ships the comparative benchmark matrix, the memory-footprint report, and the reproducibility + release documentation. Publication itself is intentionally not executed — it is human-gated (see below).

What changed

Comparative JMH matrix (benchmarks-jvm, the raw-JMH JVM tier): our LongLongMap/IntIntMap vs boxed HashMap and four specialist primitive-map libraries — fastutil 8.5.18, HPPC 0.10.0, Eclipse Collections 13.0.0, Agrona 2.5.0 — behind monomorphic adapters (CompetitorAdapters.kt). Ops: lookupHit (random-access), lookupMiss, insertPresized, insertGrowing, churn; swept over size (10k/1M) and key distribution (DENSE/CLUSTERED). Plus LoadFactorBenchmark (0.5–0.99 sweep for the load-factor-tunable competitors). A @Setup correctness gate rejects any miswired adapter before a number is taken.
Footprint report (FootprintReport.kt, ./gradlew :benchmarks-jvm:footprintReport): exact retained heap via JOL, deterministic.
Reproducibility harness: capture-env.sh (JDK/OS/CPU + whether CPU frequency was pinned), run-matrix.sh (env + footprint + authoritative JMH into a self-describing bundle), fixed seeds, and a committed benchmarks-jvm/results/m4max-indicative/ bundle.
Docs: docs/benchmarking.md (method + reproducibility), docs/publishing.md (release runbook), the Phase 6 task doc, and README / performance-goals / project updates.
Version bumped to 1.0.0-SNAPSHOT-011; dependency reports regenerated.

Results (measured)

Memory — decisive, against everyone. LongLongMap retains 19 B/entry (4.68× less than HashMap, ~1.9× less than every primitive competitor, all 36); IntIntMap 10 B/entry (7.30× vs HashMap, ~1.8× vs competitors, all 18). More compact than the specialist libraries too, from packing key + value + one control byte at 7/8 load.
Distribution robustness — the headline time result. On 1M out-of-cache random lookupHit, LongLongMap is flat from dense (9.9 ms) to adversarial/clustered (10.2 ms) keys, while fastutil degrades ~5× and Eclipse ~6×. Our fmix64 finalizer makes ours the fastest under adversarial keys — a dense-only benchmark would have inverted the conclusion, which is why the fairness gate mandates the clustered set.
Absolute time numbers are hardware-specific and deferred to a pinned-hardware run-matrix.sh run; only the ratios/robustness that survive the unpinned-hardware caveat are cited.

Deliberately deferred (human-gated)

Maven Central publication — irreversible, and needs a signing key, a Central Portal token, a claimed namespace, and a non-SNAPSHOT version. Full runbook in docs/publishing.md; elastic currently registers no publish tasks, so wiring kmp-publish is the documented first release step.
The authoritative multi-hour benchmark run on pinned hardware — the harness is one command; this PR validates it and captures real memory + indicative time numbers.

Positioning

The four competitors are a reference ceiling, not a success gate (fastutil/HPPC are co-fastest among classic open-addressing libraries). The committed baseline to beat is the standard library; our differentiators are memory, distribution robustness, and true Kotlin Multiplatform (the competitors are all JVM-only). The competitor libraries are benchmark-only and confined to the jmh source set — never a dependency of the published module.

Testing & review

./gradlew build dokkaGenerate green; the jmh benchmark sources compile and all 11 benchmark methods generate and run; the footprint report and a JMH smoke slice produce JSON.
Reviewed by kotlin-engineer, spine-code-review, and review-docs — all APPROVE. A reviewer caught (and this PR fixes) a real fairness bug: Agrona's constructor takes raw slot capacity, not expected-entries, so it was silently rehashing mid-insertPresized at 1M.

Follow-ups (out of scope)

A non-boxing forEach on the primitive-value maps (the one op the matrix cannot fairly measure today).
The pinned-hardware authoritative run and the actual publication.

🤖 Generated with Claude Code

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Phase 6 (validation & release). Add the comparative JMH matrix in the raw-JMH `benchmarks-jvm` tier — our `LongLongMap`/`IntIntMap` versus boxed `HashMap` and the four specialist primitive-map libraries (fastutil, HPPC, Eclipse Collections, Agrona) — over lookup/insert/churn across sizes and dense/adversarial key sets, plus a load-factor sweep, a JOL retained-footprint report, a reproducibility runner (env metadata + fixed seeds + JSON), and the benchmarking / release-runbook docs. Measured: our maps retain 19/10 bytes per entry — 4.68x/7.30x less than boxed `HashMap` and ~1.9x/1.8x less than the specialist libraries — and stay flat from dense to adversarial keys where the competitors degrade ~5-6x. Absolute time numbers are deferred to a pinned-hardware run; Maven Central publication is documented as a human-gated runbook (irreversible; needs signing keys, a Portal token, and a non-SNAPSHOT version). The competitor libraries are benchmark-only and JVM-only, confined to the `jmh` source set and never a dependency of the published `elastic` module. A `@Setup` correctness gate rejects any miswired adapter before a number is taken. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Regenerated for the version bump and the Phase 6 benchmark-only competitor dependencies (fastutil, HPPC, Eclipse Collections, Agrona, JOL) on the `benchmarks-jvm` `jmh` source set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

codecov · 2026-07-04T20:31:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.98%. Comparing base (a306baa) to head (529bba5).

Additional details and impacted files

@@            Coverage Diff            @@
##             master      #11   +/-   ##
=========================================
  Coverage     96.98%   96.98%           
  Complexity      431      431           
=========================================
  Files            22       22           
  Lines          1857     1857           
  Branches        285      285           
=========================================
  Hits           1801     1801           
  Misses           26       26           
  Partials         30       30

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds the Phase 6 validation + release deliverables for elastic-hashing: a comparative JVM benchmark matrix (including major primitive-map competitors), a deterministic JOL-based footprint report, a reproducibility harness that bundles environment metadata + results, and accompanying benchmarking/publishing documentation. Also bumps the snapshot version and regenerates dependency reports to reflect the added benchmark-only dependencies.

Changes:

Introduce a JMH comparative matrix (LongLongMatrixBenchmark, IntIntMatrixBenchmark, LoadFactorBenchmark) with monomorphic adapters for stdlib + competitor primitive-map libraries.
Add a deterministic retained-heap footprint report (FootprintReport.kt) plus a reproducibility runner (capture-env.sh, run-matrix.sh) and commit an indicative results bundle.
Add/refresh documentation for benchmarking methodology and the (human-gated) publishing/release runbook; bump snapshot version and dependency reports.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
version.gradle.kts	Bumps `versionToPublish` to `1.0.0-SNAPSHOT-011`.
README.md	Updates Phase 6 status and summarizes benchmark/footprint findings with doc links.
docs/publishing.md	Adds a human-run publishing & release runbook (Maven Central + KMP specifics).
docs/project.md	Updates project/module overview with Phase 6 benchmarking + footprint report context.
docs/performance-goals.md	Adds Phase 6 validation matrix goals/results and reproducibility notes.
docs/dependencies/pom.xml	Updates version and adds benchmark-related dependency entries to the doc POM.
docs/dependencies/dependencies.md	Regenerates dependency/license report for the new snapshot and added deps.
docs/benchmarking.md	Adds detailed benchmarking methodology + reproducibility instructions.
benchmarks-jvm/build.gradle.kts	Adds benchmark-only competitor deps + JOL, and registers `footprintReport`.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/CompetitorAdapters.kt	Adds monomorphic adapters and per-impl sizing rules for the comparative matrix.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/MatrixKeys.kt	Adds shared dense/clustered key generation + deterministic shuffles.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/LongLongMatrixBenchmark.kt	Adds `Long→Long` comparative JMH benchmarks (hit/miss/insert/churn).
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/IntIntMatrixBenchmark.kt	Adds `Int→Int` comparative JMH benchmarks (hit/miss/insert/churn).
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/LoadFactorBenchmark.kt	Adds load-factor sweep benchmark for tunable competitors.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt	Adds deterministic JOL retained-heap footprint report generator (md/tsv).
benchmarks-jvm/capture-env.sh	Adds environment capture script for reproducibility bundles (JDK/OS/CPU/etc).
benchmarks-jvm/run-matrix.sh	Adds one-command runner to produce self-describing benchmark bundles.
benchmarks-jvm/results/README.md	Documents the layout and intent of committed results bundles.
benchmarks-jvm/results/m4max-indicative/environment.txt	Captured environment metadata for the indicative run.
benchmarks-jvm/results/m4max-indicative/footprint.md	Committed footprint report output for the indicative run.
benchmarks-jvm/results/m4max-indicative/footprint.tsv	Committed footprint report TSV output for the indicative run.
benchmarks-jvm/results/m4max-indicative/lookup-hit-1m.md	Committed indicative lookup-hit slice summary at 1M entries.
.agents/tasks/phase-6-validation-release.md	Phase 6 task doc update (not reviewed here per org policy for `.agents/**`).

… from the LF sweep - `@Threads(1)` on all three benchmarks — `churn` mutates the shared `Scope.Benchmark` map, so a CLI `-t` override would be a data race with invalid results. - Correctness gate in `LoadFactorBenchmark`'s `@Setup` (hit-sum == key-sum), matching the matrix benchmarks, so a miswired adapter fails fast. - Drop Agrona from the load-factor sweep: its load factor is capped at 0.9 (the constructor rejects 0.99), so it cannot reach the >=0.95 points this benchmark targets — a pre-merge smoke run caught the exception. Its lookup at its own load stays in the main matrix. - `capture-env.sh`: on Linux, fall back to `/proc/cpuinfo` when `lscpu` has no "Model name:" line (an empty `cpu_model` otherwise slipped through), and always emit `os_product`. - `FootprintReport`: reword "equal occupancy" to "same entry count, each map pre-sized in its own units" (loads differ by each map's policy; capacity slack is included); regenerate the committed `footprint.md`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b9849b366

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

`FootprintReport` floored bytes-per-entry with integer division before computing the compactness ratios, so the report — and the docs citing it as exact JOL results — overstated the memory win: it printed `89 / 19 = 4.68x` where the true `LongLongMap`-vs-`HashMap` retained-total ratio is 4.59x. Keep bytes-per-entry as an exact `Double` and compute every ratio from `totalBytes`. Corrected numbers propagated to README, benchmarking.md, performance-goals.md, project.md, the phase-6 task doc, and the committed footprint bundle: `LongLongMap` 19.4 B/entry (4.59x vs `HashMap` 89.1); `IntIntMap` 10.3 (7.11x vs 73.1); competitors 36.6 / 18.3 (2.44x / 4.00x vs `HashMap`). The ours-vs-competitor factor (~1.9x / ~1.8x) is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

- Footprint report + docs: the displayed bytes-per-entry are rounded to 0.1, only the retained totals (and the ratios derived from them) are exact. Reword the report header and `benchmarking.md` so "exact" attaches to the totals, not the rounded per-entry display. - `publishing.md`: Maven Central is not a Spine-convention destination — `PublishingRepos` defines only Cloud Artifact Registry and GitHub Packages — so there is no built-in `...ToMavenCentralRepository` task. Document wiring a Central destination as an explicit prerequisite and note the task name follows the configured repository/plugin rather than hard-coding one. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… in #12 - `run-matrix.sh`: an optional core-list argument runs the single-threaded comparison matrix (LongLong/IntInt/LoadFactor) as the JMH jar directly under `taskset`, so no Gradle daemon shares the isolated cores; the pinned cores are recorded in `environment.txt`. The multi-threaded read-scaling / mixed-load benchmarks stay excluded from the pinned run (they need all cores). - `docs/benchmarking.md`: the Linux governor / turbo / isolcpus prep recipe. - Phase 6 task doc: reference issue #12 — the step-by-step runbook for configuring a pinned Linux box and handing the results back to Claude Code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

+        for (key in keys) {
+            map.put(key, key)
+        }
+        check(map.size() == n) { "Expected $n entries in `$impl`, got ${map.size()}." }
+        // Correctness gate: a benchmark on a miswired adapter is worthless. Each key's
+        // value is the key, so the sum of all hit lookups must equal the sum of keys —
+        // this catches any silent get/put defect before a single number is measured.
+        var hitSum = 0L
+        for (key in keys) {
+            hitSum += map.get(key)
+        }
+        check(hitSum == keys.sum()) {
+            "Adapter `$impl` returned wrong values: hit sum $hitSum != ${keys.sum()}."
+        }


+        for (key in keys) {
+            map.put(key, key)
+        }
+        check(map.size() == n) { "Expected $n entries in `$impl`, got ${map.size()}." }
+        // Correctness gate: a benchmark on a miswired adapter is worthless. Each key's
+        // value is the key, so the sum of all hit lookups must equal the sum of keys —
+        // this catches any silent get/put defect before a single number is measured.
+        // Widen to `Long` so a 1M-entry sum cannot overflow `Int`.
+        var hitSum = 0L
+        var expected = 0L
+        for (key in keys) {
+            hitSum += map.get(key).toLong()
+            expected += key.toLong()
+        }
+        check(hitSum == expected) {
+            "Adapter `$impl` returned wrong values: hit sum $hitSum != $expected."
+        }


+        val map = createAtMaxLoad(impl, fill)
+        for (key in keys) {
+            map.put(key, key)
+        }
+        check(map.size() == fill) { "Expected $fill entries in `$impl`, got ${map.size()}." }
+        // Correctness gate (mirrors the matrix benchmarks): each key's value is the key,
+        // so the sum of all hit lookups must equal the sum of keys — fail fast on any
+        // mis-constructed or miswired adapter rather than measuring wrong behavior.
+        var hitSum = 0L
+        for (key in keys) {
+            hitSum += map.get(key)
+        }
+        check(hitSum == keys.sum()) {
+            "Adapter `$impl` returned wrong values: hit sum $hitSum != ${keys.sum()}."
+        }
+        populated = map


alexander-yevsyukov and others added 3 commits July 4, 2026 01:30

Bump version -> 1.0.0-SNAPSHOT-011

a740468

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Update dependency reports

2b9849b

Regenerated for the version bump and the Phase 6 benchmark-only competitor dependencies (fastutil, HPPC, Eclipse Collections, Agrona, JOL) on the `benchmarks-jvm` `jmh` source set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings July 4, 2026 20:27

Copilot started reviewing on behalf of alexander-yevsyukov July 4, 2026 20:28 View session

Copilot AI reviewed Jul 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 4, 2026

View reviewed changes

Comment thread benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt Outdated

Copilot AI review requested due to automatic review settings July 4, 2026 20:55

Copilot started reviewing on behalf of alexander-yevsyukov July 4, 2026 20:55 View session

Copilot AI reviewed Jul 4, 2026

View reviewed changes

Comment thread benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt

Comment thread docs/benchmarking.md Outdated

Comment thread benchmarks-jvm/results/m4max-indicative/footprint.md Outdated

Comment thread docs/publishing.md Outdated

alexander-yevsyukov mentioned this pull request Jul 4, 2026

Phase 7: authoritative pinned-hardware benchmark matrix (Linux) #12

Open

3 tasks

Copilot AI review requested due to automatic review settings July 5, 2026 17:16

Copilot started reviewing on behalf of alexander-yevsyukov July 5, 2026 17:16 View session

Copilot AI reviewed Jul 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phase 6: validation & benchmark matrix#11

Phase 6: validation & benchmark matrix#11
alexander-yevsyukov wants to merge 7 commits into
masterfrom
phase-6

alexander-yevsyukov commented Jul 4, 2026

Uh oh!

codecov Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexander-yevsyukov commented Jul 4, 2026

What changed

Results (measured)

Deliberately deferred (human-gated)

Positioning

Testing & review

Follow-ups (out of scope)

Uh oh!

codecov Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jul 4, 2026 •

edited

Loading