Skip to content

feat: CSR adjacency index for native graph traversal#160

Open
jja725 wants to merge 3 commits into
lance-format:mainfrom
jja725:feat/csr-adjacency-index
Open

feat: CSR adjacency index for native graph traversal#160
jja725 wants to merge 3 commits into
lance-format:mainfrom
jja725:feat/csr-adjacency-index

Conversation

@jja725

@jja725 jja725 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a CSR (Compressed Sparse Row) adjacency index that enables O(1) neighbor lookup for graph traversal, replacing SQL join-based expansion with direct pointer-chasing. This is the foundation for wiring up the LanceNativePlanner placeholder with a real native execution path.

Inspired by GraphAr's CSR-in-Parquet approach (Apache incubator), adapted for Lance's columnar format.

What's included

  • CsrIndex — in-memory CSR structure with:

    • neighbors(vertex_id) — O(1) neighbor lookup via offset array
    • degree(vertex_id) — O(1) out-degree
    • bfs(start, max_hops) — k-hop BFS traversal returning vertices by distance
    • shortest_path(start, end) — BFS-based unweighted shortest path
    • to_record_batch() / neighbors_to_record_batch() — Arrow serialization for persisting as Lance datasets
  • CsrIndexBuilder — construct CSR from:

    • Individual add_edge(src, dst) calls
    • Arrow RecordBatch with src_id/dst_id columns via add_edges_from_batch()
    • Auto-inferred or explicit vertex count
  • build_bidirectional_index() — create both outgoing (CSR) and incoming (CSC) indices for undirected/reverse traversal

Why this matters

Currently lance-graph translates Cypher MATCH (a)-[:KNOWS]->(b) into SQL joins via DataFusion. For multi-hop queries, this means:

Operation Current (SQL Joins) With CSR Index
1-hop neighbor lookup O(N) filter scan O(1) offset + sequential read
k-hop traversal O(N^k) self-joins O(Σ degrees) pointer-chasing
Shortest path Recursive CTEs Direct BFS on CSR

Next steps (not in this PR)

  1. Wire CSR into LanceNativePlanner to handle LogicalOperator::Expand
  2. Persist CSR offset tables as Lance datasets alongside edge data
  3. Incremental CSR updates on edge inserts (AL→CSR compaction, per BACH paper)
  4. Combine graph traversal with Lance's vector search for hybrid queries

Test plan

  • 22 unit tests covering: basic lookups, degree, empty graphs, isolated vertices, self-loops, parallel edges, RecordBatch construction, Arrow serialization roundtrip, BFS traversal (limited hops, disconnected, invalid start), shortest path (direct, multi-hop, same vertex, unreachable, invalid), bidirectional index, auto-inferred vertex count
  • cargo clippy -p lance-graph -- -D warnings passes clean

Closes #159

jja725 and others added 3 commits June 16, 2026 23:02
Add a Compressed Sparse Row (CSR) index that enables O(1) neighbor
lookup for graph traversal, replacing SQL join-based expansion with
direct pointer-chasing. Inspired by GraphAr's CSR-in-Parquet approach.

Includes:
- CsrIndex: in-memory CSR with neighbor lookup, BFS, shortest path
- CsrIndexBuilder: construct CSR from edge pairs or Arrow RecordBatch
- build_bidirectional_index: create both outgoing (CSR) and incoming (CSC)
- Arrow serialization for persisting offset/neighbor tables as Lance datasets

Closes lance-format#159

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Limit coverage to --lib tests (unit tests only) and cap parallel build
jobs to 2 to prevent linker OOM (Bus error / signal 7) on the
ubuntu-24.04 CI runner. The coverage-instrumented binary grew past the
runner's memory limit with additional modules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.81132% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/lance-graph/src/csr_index.rs 94.81% 22 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: CSR adjacency index for native graph traversal (inspired by DuckPGQ + icebug-format)

2 participants