Skip to content

Phase 6: validation & benchmark matrix#11

Open
alexander-yevsyukov wants to merge 7 commits into
masterfrom
phase-6
Open

Phase 6: validation & benchmark matrix#11
alexander-yevsyukov wants to merge 7 commits into
masterfrom
phase-6

Conversation

@alexander-yevsyukov

Copy link
Copy Markdown
Contributor

Phase 6 (validation & release) of the elastic-hashing implementation plan. Ships the comparative benchmark matrix, the memory-footprint report, and the reproducibility + release documentation. Publication itself is intentionally not executed — it is human-gated (see below).

What changed

  • Comparative JMH matrix (benchmarks-jvm, the raw-JMH JVM tier): our LongLongMap/IntIntMap vs boxed HashMap and four specialist primitive-map libraries — fastutil 8.5.18, HPPC 0.10.0, Eclipse Collections 13.0.0, Agrona 2.5.0 — behind monomorphic adapters (CompetitorAdapters.kt). Ops: lookupHit (random-access), lookupMiss, insertPresized, insertGrowing, churn; swept over size (10k/1M) and key distribution (DENSE/CLUSTERED). Plus LoadFactorBenchmark (0.5–0.99 sweep for the load-factor-tunable competitors). A @Setup correctness gate rejects any miswired adapter before a number is taken.
  • Footprint report (FootprintReport.kt, ./gradlew :benchmarks-jvm:footprintReport): exact retained heap via JOL, deterministic.
  • Reproducibility harness: capture-env.sh (JDK/OS/CPU + whether CPU frequency was pinned), run-matrix.sh (env + footprint + authoritative JMH into a self-describing bundle), fixed seeds, and a committed benchmarks-jvm/results/m4max-indicative/ bundle.
  • Docs: docs/benchmarking.md (method + reproducibility), docs/publishing.md (release runbook), the Phase 6 task doc, and README / performance-goals / project updates.
  • Version bumped to 1.0.0-SNAPSHOT-011; dependency reports regenerated.

Results (measured)

  • Memory — decisive, against everyone. LongLongMap retains 19 B/entry (4.68× less than HashMap, ~1.9× less than every primitive competitor, all 36); IntIntMap 10 B/entry (7.30× vs HashMap, ~1.8× vs competitors, all 18). More compact than the specialist libraries too, from packing key + value + one control byte at 7/8 load.
  • Distribution robustness — the headline time result. On 1M out-of-cache random lookupHit, LongLongMap is flat from dense (9.9 ms) to adversarial/clustered (10.2 ms) keys, while fastutil degrades ~5× and Eclipse ~6×. Our fmix64 finalizer makes ours the fastest under adversarial keys — a dense-only benchmark would have inverted the conclusion, which is why the fairness gate mandates the clustered set.
  • Absolute time numbers are hardware-specific and deferred to a pinned-hardware run-matrix.sh run; only the ratios/robustness that survive the unpinned-hardware caveat are cited.

Deliberately deferred (human-gated)

  • Maven Central publication — irreversible, and needs a signing key, a Central Portal token, a claimed namespace, and a non-SNAPSHOT version. Full runbook in docs/publishing.md; elastic currently registers no publish tasks, so wiring kmp-publish is the documented first release step.
  • The authoritative multi-hour benchmark run on pinned hardware — the harness is one command; this PR validates it and captures real memory + indicative time numbers.

Positioning

The four competitors are a reference ceiling, not a success gate (fastutil/HPPC are co-fastest among classic open-addressing libraries). The committed baseline to beat is the standard library; our differentiators are memory, distribution robustness, and true Kotlin Multiplatform (the competitors are all JVM-only). The competitor libraries are benchmark-only and confined to the jmh source set — never a dependency of the published module.

Testing & review

  • ./gradlew build dokkaGenerate green; the jmh benchmark sources compile and all 11 benchmark methods generate and run; the footprint report and a JMH smoke slice produce JSON.
  • Reviewed by kotlin-engineer, spine-code-review, and review-docs — all APPROVE. A reviewer caught (and this PR fixes) a real fairness bug: Agrona's constructor takes raw slot capacity, not expected-entries, so it was silently rehashing mid-insertPresized at 1M.

Follow-ups (out of scope)

  • A non-boxing forEach on the primitive-value maps (the one op the matrix cannot fairly measure today).
  • The pinned-hardware authoritative run and the actual publication.

🤖 Generated with Claude Code

alexander-yevsyukov and others added 3 commits July 4, 2026 01:30
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Phase 6 (validation & release). Add the comparative JMH matrix in the raw-JMH
`benchmarks-jvm` tier — our `LongLongMap`/`IntIntMap` versus boxed `HashMap` and
the four specialist primitive-map libraries (fastutil, HPPC, Eclipse Collections,
Agrona) — over lookup/insert/churn across sizes and dense/adversarial key sets,
plus a load-factor sweep, a JOL retained-footprint report, a reproducibility
runner (env metadata + fixed seeds + JSON), and the benchmarking / release-runbook
docs.

Measured: our maps retain 19/10 bytes per entry — 4.68x/7.30x less than boxed
`HashMap` and ~1.9x/1.8x less than the specialist libraries — and stay flat from
dense to adversarial keys where the competitors degrade ~5-6x. Absolute time
numbers are deferred to a pinned-hardware run; Maven Central publication is
documented as a human-gated runbook (irreversible; needs signing keys, a Portal
token, and a non-SNAPSHOT version).

The competitor libraries are benchmark-only and JVM-only, confined to the `jmh`
source set and never a dependency of the published `elastic` module. A `@Setup`
correctness gate rejects any miswired adapter before a number is taken.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Regenerated for the version bump and the Phase 6 benchmark-only competitor
dependencies (fastutil, HPPC, Eclipse Collections, Agrona, JOL) on the
`benchmarks-jvm` `jmh` source set.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 4, 2026 20:27
@codecov

codecov Bot commented Jul 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.98%. Comparing base (a306baa) to head (529bba5).

Additional details and impacted files
@@            Coverage Diff            @@
##             master      #11   +/-   ##
=========================================
  Coverage     96.98%   96.98%           
  Complexity      431      431           
=========================================
  Files            22       22           
  Lines          1857     1857           
  Branches        285      285           
=========================================
  Hits           1801     1801           
  Misses           26       26           
  Partials         30       30           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the Phase 6 validation + release deliverables for elastic-hashing: a comparative JVM benchmark matrix (including major primitive-map competitors), a deterministic JOL-based footprint report, a reproducibility harness that bundles environment metadata + results, and accompanying benchmarking/publishing documentation. Also bumps the snapshot version and regenerates dependency reports to reflect the added benchmark-only dependencies.

Changes:

  • Introduce a JMH comparative matrix (LongLongMatrixBenchmark, IntIntMatrixBenchmark, LoadFactorBenchmark) with monomorphic adapters for stdlib + competitor primitive-map libraries.
  • Add a deterministic retained-heap footprint report (FootprintReport.kt) plus a reproducibility runner (capture-env.sh, run-matrix.sh) and commit an indicative results bundle.
  • Add/refresh documentation for benchmarking methodology and the (human-gated) publishing/release runbook; bump snapshot version and dependency reports.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
version.gradle.kts Bumps versionToPublish to 1.0.0-SNAPSHOT-011.
README.md Updates Phase 6 status and summarizes benchmark/footprint findings with doc links.
docs/publishing.md Adds a human-run publishing & release runbook (Maven Central + KMP specifics).
docs/project.md Updates project/module overview with Phase 6 benchmarking + footprint report context.
docs/performance-goals.md Adds Phase 6 validation matrix goals/results and reproducibility notes.
docs/dependencies/pom.xml Updates version and adds benchmark-related dependency entries to the doc POM.
docs/dependencies/dependencies.md Regenerates dependency/license report for the new snapshot and added deps.
docs/benchmarking.md Adds detailed benchmarking methodology + reproducibility instructions.
benchmarks-jvm/build.gradle.kts Adds benchmark-only competitor deps + JOL, and registers footprintReport.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/CompetitorAdapters.kt Adds monomorphic adapters and per-impl sizing rules for the comparative matrix.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/MatrixKeys.kt Adds shared dense/clustered key generation + deterministic shuffles.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/LongLongMatrixBenchmark.kt Adds Long→Long comparative JMH benchmarks (hit/miss/insert/churn).
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/IntIntMatrixBenchmark.kt Adds Int→Int comparative JMH benchmarks (hit/miss/insert/churn).
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/LoadFactorBenchmark.kt Adds load-factor sweep benchmark for tunable competitors.
benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt Adds deterministic JOL retained-heap footprint report generator (md/tsv).
benchmarks-jvm/capture-env.sh Adds environment capture script for reproducibility bundles (JDK/OS/CPU/etc).
benchmarks-jvm/run-matrix.sh Adds one-command runner to produce self-describing benchmark bundles.
benchmarks-jvm/results/README.md Documents the layout and intent of committed results bundles.
benchmarks-jvm/results/m4max-indicative/environment.txt Captured environment metadata for the indicative run.
benchmarks-jvm/results/m4max-indicative/footprint.md Committed footprint report output for the indicative run.
benchmarks-jvm/results/m4max-indicative/footprint.tsv Committed footprint report TSV output for the indicative run.
benchmarks-jvm/results/m4max-indicative/lookup-hit-1m.md Committed indicative lookup-hit slice summary at 1M entries.
.agents/tasks/phase-6-validation-release.md Phase 6 task doc update (not reviewed here per org policy for .agents/**).

Comment thread benchmarks-jvm/capture-env.sh
Comment thread benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt Outdated
Comment thread benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt Outdated
Comment thread benchmarks-jvm/results/m4max-indicative/footprint.md Outdated
… from the LF sweep

- `@Threads(1)` on all three benchmarks — `churn` mutates the shared
  `Scope.Benchmark` map, so a CLI `-t` override would be a data race with invalid
  results.
- Correctness gate in `LoadFactorBenchmark`'s `@Setup` (hit-sum == key-sum),
  matching the matrix benchmarks, so a miswired adapter fails fast.
- Drop Agrona from the load-factor sweep: its load factor is capped at 0.9 (the
  constructor rejects 0.99), so it cannot reach the >=0.95 points this benchmark
  targets — a pre-merge smoke run caught the exception. Its lookup at its own load
  stays in the main matrix.
- `capture-env.sh`: on Linux, fall back to `/proc/cpuinfo` when `lscpu` has no
  "Model name:" line (an empty `cpu_model` otherwise slipped through), and always
  emit `os_product`.
- `FootprintReport`: reword "equal occupancy" to "same entry count, each map
  pre-sized in its own units" (loads differ by each map's policy; capacity slack is
  included); regenerate the committed `footprint.md`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b9849b366

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread benchmarks-jvm/src/jmh/kotlin/io/spine/elastic/benchmark/jmh/FootprintReport.kt Outdated
`FootprintReport` floored bytes-per-entry with integer division before computing
the compactness ratios, so the report — and the docs citing it as exact JOL
results — overstated the memory win: it printed `89 / 19 = 4.68x` where the true
`LongLongMap`-vs-`HashMap` retained-total ratio is 4.59x. Keep bytes-per-entry as
an exact `Double` and compute every ratio from `totalBytes`.

Corrected numbers propagated to README, benchmarking.md, performance-goals.md,
project.md, the phase-6 task doc, and the committed footprint bundle:
`LongLongMap` 19.4 B/entry (4.59x vs `HashMap` 89.1); `IntIntMap` 10.3 (7.11x vs
73.1); competitors 36.6 / 18.3 (2.44x / 4.00x vs `HashMap`). The ours-vs-competitor
factor (~1.9x / ~1.8x) is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 4, 2026 20:55

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comment thread docs/benchmarking.md Outdated
Comment thread benchmarks-jvm/results/m4max-indicative/footprint.md Outdated
Comment thread docs/publishing.md Outdated
- Footprint report + docs: the displayed bytes-per-entry are rounded to 0.1, only
  the retained totals (and the ratios derived from them) are exact. Reword the
  report header and `benchmarking.md` so "exact" attaches to the totals, not the
  rounded per-entry display.
- `publishing.md`: Maven Central is not a Spine-convention destination —
  `PublishingRepos` defines only Cloud Artifact Registry and GitHub Packages — so
  there is no built-in `...ToMavenCentralRepository` task. Document wiring a Central
  destination as an explicit prerequisite and note the task name follows the
  configured repository/plugin rather than hard-coding one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… in #12

- `run-matrix.sh`: an optional core-list argument runs the single-threaded
  comparison matrix (LongLong/IntInt/LoadFactor) as the JMH jar directly under
  `taskset`, so no Gradle daemon shares the isolated cores; the pinned cores are
  recorded in `environment.txt`. The multi-threaded read-scaling / mixed-load
  benchmarks stay excluded from the pinned run (they need all cores).
- `docs/benchmarking.md`: the Linux governor / turbo / isolcpus prep recipe.
- Phase 6 task doc: reference issue #12 — the step-by-step runbook for configuring
  a pinned Linux box and handing the results back to Claude Code.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings July 5, 2026 17:16

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

Comment on lines +95 to +108
for (key in keys) {
map.put(key, key)
}
check(map.size() == n) { "Expected $n entries in `$impl`, got ${map.size()}." }
// Correctness gate: a benchmark on a miswired adapter is worthless. Each key's
// value is the key, so the sum of all hit lookups must equal the sum of keys —
// this catches any silent get/put defect before a single number is measured.
var hitSum = 0L
for (key in keys) {
hitSum += map.get(key)
}
check(hitSum == keys.sum()) {
"Adapter `$impl` returned wrong values: hit sum $hitSum != ${keys.sum()}."
}
Comment on lines +87 to +103
for (key in keys) {
map.put(key, key)
}
check(map.size() == n) { "Expected $n entries in `$impl`, got ${map.size()}." }
// Correctness gate: a benchmark on a miswired adapter is worthless. Each key's
// value is the key, so the sum of all hit lookups must equal the sum of keys —
// this catches any silent get/put defect before a single number is measured.
// Widen to `Long` so a 1M-entry sum cannot overflow `Int`.
var hitSum = 0L
var expected = 0L
for (key in keys) {
hitSum += map.get(key).toLong()
expected += key.toLong()
}
check(hitSum == expected) {
"Adapter `$impl` returned wrong values: hit sum $hitSum != $expected."
}
Comment on lines +101 to +116
val map = createAtMaxLoad(impl, fill)
for (key in keys) {
map.put(key, key)
}
check(map.size() == fill) { "Expected $fill entries in `$impl`, got ${map.size()}." }
// Correctness gate (mirrors the matrix benchmarks): each key's value is the key,
// so the sum of all hit lookups must equal the sum of keys — fail fast on any
// mis-constructed or miswired adapter rather than measuring wrong behavior.
var hitSum = 0L
for (key in keys) {
hitSum += map.get(key)
}
check(hitSum == keys.sum()) {
"Adapter `$impl` returned wrong values: hit sum $hitSum != ${keys.sum()}."
}
populated = map
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants