Skip to content

diskann-benchmark refactor pt2 - shared logic overhaul#1169

Open
JordanMaples wants to merge 7 commits into
mainfrom
jordanmaples/benchmarks-refactor-v3
Open

diskann-benchmark refactor pt2 - shared logic overhaul#1169
JordanMaples wants to merge 7 commits into
mainfrom
jordanmaples/benchmarks-refactor-v3

Conversation

@JordanMaples

@JordanMaples JordanMaples commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Follow-up to #1168 (module reorganization). This PR contains the behavioral changes that were previously combined in #1142.

Summary

Simplifies the streaming benchmark infrastructure, unifies bf-tree input types, and makes Managed ID translation opt-in — preparing for its removal with inmem 2.0.

Changes

Managed ID translation is now opt-in

  • run_streaming and the streaming stack are now generic over the stream type via a new StreamingOutput trait
  • bftree: implements Stream<DataArgs> directly on StreamRunner, using tags as slot IDs with no Managed wrapper (no ID translation overhead)
  • inmem: continues using Managed for tag-to-slot ID translation until inmem 2.0 handles this natively
  • Removed SlotReclaim::Immediate and recycle_tags (bftree-only dead code)
  • Added build_direct_streamer for non-managed paths alongside the existing build_managed_streamer

Shared streaming execution layer

  • Introduce StreamRunner, Managed, build_managed_streamer, and run_streaming — a reusable stack shared across in-mem and bf-tree streaming benchmarks, replacing duplicated per-backend logic
  • Add InmemMaintainer (DropDeleted + Release) extracted from the old inline implementation

Streaming search overhaul (addresses @magdalendobson feedback on #1142)

  • New StreamingSearchParams struct with scalar num_threads / search_l — streaming runs a single search config between runbook stages rather than sweeping a parameter matrix
  • Remove the vestigial groundtruth field that was validated but never read during streaming execution

Input validation

  • Validate search_l >= search_n in StreamingSearchParams and GraphSearch
  • Validate recall_k <= search_n in StreamingSearchParams and GraphSearch to fail fast before long benchmark runs

Input unification

  • Rename Dynamic → Streaming for runbook params and input types (graph-index-dynamicgraph-index-stream-run)
  • Unify BfTreeBuild and BfTreeStreamingRun to use a QuantConfig enum (none | spherical) instead of separate input structs per quantization mode
  • Extract quantizer_util for shared spherical quantizer construction
  • Restore InlineFilterSearchPhase naming (accidentally overwritten)

Housekeeping

  • Simplify PhantomData<fn() -> T> to PhantomData<T> on StreamRunner
  • Delete streaming_utils.rs and streaming/full_precision.rs (subsumed by the shared layer)
  • Add new example: graph-index-bftree-stream-spherical.json
  • Update README with current command syntax

Breaking changes to JSON inputs

Streaming input files need updating:

  • "type": "graph-index-dynamic""graph-index-stream-run"
  • "search_phase""search" (flat struct with queries, reps, num_threads, search_l, search_n, recall_k)
  • Bf-tree inputs now require "quantization": { "kind": "none" } or the spherical config

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors diskann-benchmark streaming benchmarks by introducing a shared streaming execution layer (runner + common helpers), unifying bf-tree streaming/build inputs (including quantization configuration), and updating examples/docs to the new JSON schema.

Changes:

  • Introduce shared streaming infrastructure (StreamRunner, Maintainer, build_streamer, run_streaming) and extract in-mem maintenance logic into InmemMaintainer.
  • Unify/rename streaming inputs (Dynamic → Streaming) and introduce scalar StreamingSearchParams (no parameter sweep matrix) for streaming runs.
  • Consolidate bf-tree quantization inputs behind a single QuantConfig enum and add a shared spherical quantizer helper.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
diskann-benchmark/src/main.rs Updates integration test coverage for renamed streaming example and adds bf-tree spherical streaming example test.
diskann-benchmark/src/inputs/graph_index.rs Renames dynamic→streaming types; introduces StreamingSearchParams; privatizes some input fields behind accessors.
diskann-benchmark/src/inputs/exhaustive.rs Makes TransformKind clonable (used by new shared quantization config).
diskann-benchmark/src/inputs/bftree.rs Unifies bf-tree build/streaming inputs around QuantConfig and streaming search/runbook types.
diskann-benchmark/src/index/streaming/runner.rs Adds generic shared streaming runner with pluggable maintenance.
diskann-benchmark/src/index/streaming/mod.rs Adds shared streamer construction + runbook execution helpers; re-exports shared pieces.
diskann-benchmark/src/index/streaming/full_precision.rs Removes legacy full-precision streaming implementation (superseded by shared runner).
diskann-benchmark/src/index/result.rs Tightens visibility of result structs/enums to the index module.
diskann-benchmark/src/index/inmem/streaming.rs Introduces InmemMaintainer (DropDeleted + Release) extracted from old inline logic.
diskann-benchmark/src/index/inmem/spherical.rs Updates to new accessor-based input APIs.
diskann-benchmark/src/index/inmem/scalar.rs Updates to new accessor-based input APIs.
diskann-benchmark/src/index/inmem/product.rs Updates to new accessor-based input APIs.
diskann-benchmark/src/index/inmem/mod.rs Exposes InmemMaintainer and adds inmem streaming module.
diskann-benchmark/src/index/build.rs Tightens visibility of internal build helpers/types.
diskann-benchmark/src/index/bftree/streaming_utils.rs Removes duplicated streaming runbook-driving helper (replaced by shared run_streaming).
diskann-benchmark/src/index/bftree/spherical.rs Switches spherical bf-tree build to unified BfTreeBuild + shared quantizer helper.
diskann-benchmark/src/index/bftree/spherical_streaming.rs Switches bf-tree spherical streaming to shared streamer + StreamRunner.
diskann-benchmark/src/index/bftree/quantizer_util.rs Adds shared spherical quantizer construction utility for bf-tree benchmarks.
diskann-benchmark/src/index/bftree/mod.rs Updates module docs and wires new shared quantizer utility.
diskann-benchmark/src/index/bftree/full_precision.rs Switches bf-tree full-precision build to unified BfTreeBuild (with QuantConfig::None).
diskann-benchmark/src/index/bftree/full_precision_streaming.rs Switches bf-tree full-precision streaming to shared streamer + StreamRunner.
diskann-benchmark/src/index/benchmarks.rs Refactors in-mem streaming benchmark to use shared run_streaming/build_streamer + StreamRunner.
diskann-benchmark/README.md Updates documented command/example to new streaming input filename and tag.
diskann-benchmark/example/graph-index-stream.json Updates streaming example JSON to new schema (type, search shape, no groundtruth).
diskann-benchmark/example/graph-index-bftree.json Updates bf-tree build example to unified graph-index-bftree + quantization config.
diskann-benchmark/example/graph-index-bftree-stream.json Updates bf-tree streaming example to unified tag + new search + quantization config.
diskann-benchmark/example/graph-index-bftree-stream-spherical.json Adds new example for bf-tree spherical streaming with unified quantization config.
diskann-benchmark/example/graph-index-bftree-spherical.json Updates spherical bf-tree build example to unified tag + nested quantization config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-benchmark/src/inputs/graph_index.rs
Comment thread diskann-benchmark/src/index/streaming/mod.rs Outdated
@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 71.72775% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.76%. Comparing base (b5ebac2) to head (ff5cff6).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
diskann-benchmark/src/index/streaming/runner.rs 55.63% 63 Missing ⚠️
diskann-benchmark/src/inputs/graph_index.rs 54.23% 27 Missing ⚠️
diskann-benchmark/src/index/streaming/mod.rs 88.88% 9 Missing ⚠️
diskann-benchmark/src/index/benchmarks.rs 87.03% 7 Missing ⚠️
diskann-benchmark/src/index/build.rs 75.00% 1 Missing ⚠️
diskann-benchmark/src/index/inmem/streaming.rs 97.43% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1169      +/-   ##
==========================================
- Coverage   90.99%   90.76%   -0.24%     
==========================================
  Files         489      490       +1     
  Lines       93130    93360     +230     
==========================================
- Hits        84746    84736      -10     
- Misses       8384     8624     +240     
Flag Coverage Δ
miri 90.76% <71.72%> (-0.24%) ⬇️
unittests 90.72% <71.72%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/index/inmem/product.rs 100.00% <ø> (ø)
diskann-benchmark/src/index/inmem/scalar.rs 100.00% <ø> (ø)
diskann-benchmark/src/index/inmem/spherical.rs 100.00% <ø> (ø)
diskann-benchmark/src/index/result.rs 49.03% <100.00%> (ø)
diskann-benchmark/src/index/streaming/managed.rs 97.56% <ø> (ø)
diskann-benchmark/src/inputs/exhaustive.rs 26.83% <ø> (ø)
diskann-benchmark/src/main.rs 91.36% <100.00%> (ø)
diskann-benchmark/src/utils/streaming.rs 91.38% <ø> (ø)
diskann-benchmark/src/index/build.rs 85.92% <75.00%> (ø)
diskann-benchmark/src/index/inmem/streaming.rs 97.43% <97.43%> (ø)
... and 4 more

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread diskann-benchmark/src/inputs/bftree.rs Outdated
Comment thread diskann-benchmark/src/inputs/graph_index.rs Outdated
Comment thread diskann-benchmark/src/inputs/bftree.rs Outdated
JordanMaples and others added 4 commits June 22, 2026 08:32
Behavioral changes on top of the module reorganization:

- Introduce shared StreamRunner/Managed/build_streamer abstractions
  for streaming benchmarks, replacing duplicated per-backend logic
- Rename Dynamic -> Streaming for runbook params and input types
- Unify bftree build/streaming inputs with QuantConfig enum (none or
  spherical) instead of separate input structs
- Add StreamingSearchParams with scalar num_threads/search_l, removing
  the vestigial groundtruth file requirement and unnecessary parameter
  sweep arrays from streaming search
- Extract quantizer_util for shared spherical quantizer construction
- Add InmemMaintainer for inmem streaming maintenance (DropDeleted +
  Release)
- Privatize input struct fields behind accessor methods

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples force-pushed the jordanmaples/benchmarks-refactor-v3 branch from c33c71c to 634b25c Compare June 22, 2026 15:35
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread diskann-benchmark/src/inputs/graph_index.rs
Comment thread diskann-benchmark/src/inputs/graph_index.rs

@magdalendobson magdalendobson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks close, just a couple issues in comments.

… search_n validation

- Restore InlineFilterSearchPhase naming that was accidentally overwritten
- Add recall_k > search_n validation to GraphSearch::validate and
  StreamingSearchParams::validate to fail fast before running benchmarks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two big concerns here:

  1. Moving from pub(crate) members to pub(crate) accessors adds a lot of noise to the diff without meaningfully increasing protection. The real danger with sharing inputs is having backends ignore fields entirely, which per-field accessors doesn't really change.
  2. This PR centers around ManagedStream, but that is a crutch for providers not being able to manage their own ID translation. The upcoming inmem 2.0 (which can do everything except make it to a PR ready state), will not need this management layer. In which case, will a separate streaming implementation be needed anyways? Is there a way to share this infrastructure without requiring ManagedStream?

inplace_delete_num_to_replace: usize,
inplace_delete_method: InplaceDeleteMethod,
maintainer: M,
_marker: PhantomData<fn() -> T>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this invariant on T?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched it to PhantomData<T>

Introduce StreamingOutput trait and make run_streaming generic over
the stream type. This allows:

- bftree: implements Stream<DataArgs> directly on StreamRunner, using
  tags as slot IDs without translation (no Managed wrapper)
- inmem: continues using Managed for tag-to-slot ID translation

When inmem 2.0 lands with native ID management, Managed can be
deleted entirely with no infrastructure changes needed.

Also:
- Remove SlotReclaim::Immediate and recycle_tags (bftree-only paths)
- Simplify PhantomData<fn() -> T> to PhantomData<T> on StreamRunner

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@hildebrandmw hildebrandmw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something was bothering me about removing the ID translation layer for bf-tree, and I think I managed to figure out what it is. For some reason, bf-tree is pre-configure with a maximum capacity. The BAB streaming runs simulate the runbook to obtain the maximum number of active points, which could be less than the maximum tag. It's this maximum that is provided to the BfTreeProvider's constructor. So, for runbooks with an active set that is indeed less, then the untranslated IDs will trigger the assert linked earlier. The runbook in test_data doesn't hit this because it doesn't have the property of the active points being less than the maximum tag.

Did this latest iteration go through the wikipedia runbook.

A small example runbook that triggers this situation:

sift-small-256:
  max_pts: 128
  1:
    operation: "insert"
    start: 0
    end: 128
  2:
    operation: "search"
  3:
    operation: "delete"
    start: 0
    end: 64
  4:
    operation: "insert"
    start: 128
    end: 192
  5:
    operation: "search"
  6:
    operation: "replace"
    tags_start: 64
    tags_end: 128
    ids_start: 192
    ids_end: 256
  7:
    operation: "search"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants