Cursor-style semantic code search as an MCP plugin for Claude Code.
Instead of grepping 50 files and sending 30,000 tokens to Claude, VecGrep returns the top 8 semantically relevant code chunks (~1,600 tokens). That's a ~95% token reduction for codebase queries.
Measured on the VecGrep codebase itself (5 source files, ~26k tokens raw).
| Mode | Avg tokens returned | vs raw read | Savings |
|---|---|---|---|
| Raw file read (baseline) | 26,009 | — | — |
search_code (top_k=8) |
~3,007 | 11.6% | 88% |
hybrid_search (top_k=8) |
~3,324 | 12.8% | 87% |
search_graph (limit=8) |
~47 | 0.2% | >99% |
search_graph returns structured node metadata only (name, kind, file, line range) — no source code — so it's ultra-cheap for structural questions ("where is X defined?", "what calls Y?").
| Mode | Latency |
|---|---|
search_graph |
~3ms |
hybrid_search |
~76ms |
search_code |
~83ms |
search_graph is ~30× faster than vector search — pure in-memory graph traversal, no embedding model call.
For name-based structural queries, pure vector search can rank documentation (CHANGELOG, README) above source code. The graph index fixes this:
| Query | search_code #1 |
hybrid_search #1 |
|---|---|---|
| "VectorStore search method" | [WRONG] CHANGELOG.md | [OK] store.py |
| "GraphStore build" | [WRONG] CHANGELOG.md | [OK] server.py |
| "embedding provider factory" | [OK] embedder.py | [OK] embedder.py |
| "AST chunking tree-sitter" | [OK] chunker.py | [OK] chunker.py |
The graph score (graph_score: 1.00) overrides a misleading vector match whenever the query directly names a known symbol.
Rule of thumb: use
search_codefor semantic/behaviour queries,search_graphfor structural/navigation queries,hybrid_searchwhen you need both.
- Chunk — Parses source files with tree-sitter to extract semantic units (functions, classes, methods)
- Embed — Encodes each chunk using the configured embedding provider:
- Local (default) —
all-MiniLM-L6-v2-code-search-512via fastembed ONNX (~100ms startup, no API key) or PyTorch, with auto device detection (Apple Silicon, CUDA, CPU) - Cloud (BYOK) — OpenAI, Voyage AI, or Google Gemini via your own API key (higher-quality embeddings, optional)
- Local (default) —
- Store — Saves embeddings + metadata in LanceDB under
~/.vecgrep/<project_hash>/; vector dimensions adapt automatically to the chosen provider - Search — ANN index (IVF-PQ) for fast approximate search on large codebases
Incremental re-indexing via mtime/size checks skips unchanged files.
Requires Python 3.12 and uv.
Note: Python 3.12 is required —
tree-sitter-languagesdoes not yet have wheels for Python 3.13+.
pip install vecgrep # standard pip
uv tool install --python 3.12 vecgrep # uv tool (recommended)Run once — works for every project:
claude mcp add --scope user vecgrep -- vecgrepThis installs VecGrep as a persistent binary and registers it in your user config (~/.claude.json) so it's available globally across all projects. Starts instantly — no download delay on Claude Code launch.
You don't trigger VecGrep manually - Claude decides when to call the tools based on what you ask.
| What you say to Claude | Tool invoked |
|---|---|
| "Index my project at /Users/me/myapp" | index_codebase |
| "How does authentication work in this codebase?" | search_code |
| "Find where database connections are set up" | search_code |
| "How many files are indexed?" | get_index_status |
| "Build a knowledge graph of my project" | index_graph |
| "What calls the VectorStore.search method?" | search_graph + graph_neighbors |
| "Find code structurally related to authentication" | hybrid_search |
Typical first-time flow:
You: "Search for how payments are handled in /Users/me/myapp"
Claude: [calls index_codebase automatically since no index exists]
Claude: [calls search_code with your query]
Claude: "Here's how payments work — in src/payments.py:42..."
After the first index, subsequent searches skip unchanged files automatically — no re-indexing needed unless your code changes.
Index a project directory. Skips unchanged files on subsequent calls.
index_codebase("/path/to/myproject")
# → "Indexed 142 file(s), 1847 chunk(s) added (0 file(s) skipped, unchanged)"
# Use OpenAI embeddings instead of local
index_codebase("/path/to/myproject", provider="openai")
Provider lock: once a project is indexed with a provider, re-indexing with a different provider requires force=True (this rebuilds the vector table with the new embedding dimensions).
Note: watch=True is only supported with the local provider — live sync with cloud providers would incur unbounded API costs.
Semantic search. Auto-indexes if no index exists.
search_code("how does user authentication work", "/path/to/myproject")
Returns formatted snippets with file paths, line numbers, and similarity scores:
[1] src/auth.py:45-72 (score: 0.87)
def authenticate_user(token: str) -> User:
...
[2] src/middleware.py:12-28 (score: 0.81)
...
Check index statistics, including the embedding provider used.
Index status for: /path/to/myproject
Files indexed: 142
Total chunks: 1847
Last indexed: 2026-02-22T07:20:31+00:00
Index size: 28.4 MB
Provider: local
Model: isuruwijesiri/all-MiniLM-L6-v2-code-search-512
Dimensions: 384
Build a structural knowledge graph from the codebase using tree-sitter AST extraction. No LLM required — extracts files, functions, classes, and methods as nodes; contains, calls, imports, and inherits as directed edges. Independent of the vector index.
index_graph("/path/to/myproject")
# → "Graph built: 496 nodes, 1251 edges, 35 files processed."
Keyword search over node labels (function names, class names, file names). Returns structural nodes with source location and connectivity degree. Ultra-cheap: ~47 tokens average, ~3ms latency.
search_graph("VectorStore", "/path/to/myproject")
# → [1] CLASS VectorStore (score: 1.00, degree: 39)
# src/vecgrep/store.py:49-352
Return the structural neighbourhood of any node — callers, callees, imports, contained methods, and inheritance edges. Use search_graph first to find the node ID.
graph_neighbors("VectorStore", "/path/to/myproject", depth=1)
# → Callers (18): _get_store, migrate_project, test fixtures...
# Contains (18): search, add_chunks, replace_file_chunks...
Vector similarity search re-ranked by graph proximity. Final score = alpha * vector_score + (1 - alpha) * graph_score. Fixes cases where documentation ranks above source code on pure embedding similarity.
hybrid_search("VectorStore search method", "/path/to/myproject", alpha=0.6)
# → [1] src/vecgrep/store.py:292-320 (blended: 0.70, vec: 0.49, graph: 1.00)
Requires both index_codebase and index_graph to have been run. Degrades gracefully to pure vector search if the graph index is absent.
VecGrep can be tuned via environment variables:
| Variable | Default | Description |
|---|---|---|
VECGREP_BACKEND |
onnx |
Local backend: onnx (fastembed, fast startup) or torch (sentence-transformers, any HF model) |
VECGREP_MODEL |
isuruwijesiri/all-MiniLM-L6-v2-code-search-512 |
HuggingFace model ID (local provider only) |
Backend comparison:
| Backend | Startup | PyTorch required | Custom HF models |
|---|---|---|---|
onnx (default) |
~100ms | No | ONNX-exported models only |
torch |
~2–3s | Yes | Any HuggingFace model |
VecGrep supports three cloud embedding providers. Each requires an API key environment variable and the corresponding optional dependency.
| Provider | Env var | Model | Dims | Install extra |
|---|---|---|---|---|
openai |
VECGREP_OPENAI_KEY |
text-embedding-3-small |
1536 | vecgrep[openai] |
voyage |
VECGREP_VOYAGE_KEY |
voyage-code-3 |
1024 | vecgrep[voyage] |
gemini |
VECGREP_GEMINI_KEY |
gemini-embedding-exp-03-07 |
3072 | vecgrep[gemini] |
Install cloud extras:
# Single provider
uv tool install --python 3.12 'vecgrep[openai]'
pip install 'vecgrep[openai]'
# All cloud providers at once
pip install 'vecgrep[cloud]'Use a cloud provider:
# Set your API key
export VECGREP_OPENAI_KEY=sk-...
# Index with OpenAI embeddings
index_codebase("/path/to/myproject", provider="openai")
# Or tell Claude to use it:
# "Index my project at /path/to/myproject using openai embeddings"Switch providers (requires force re-index to rebuild the vector table):
index_codebase("/path/to/myproject", provider="voyage", force=True)
Local backend examples:
# Use a different model with the torch backend
VECGREP_BACKEND=torch VECGREP_MODEL=sentence-transformers/all-MiniLM-L6-v2 vecgrep
# Use a custom ONNX model
VECGREP_MODEL=my-org/my-onnx-model vecgrepPython, JavaScript/TypeScript, Rust, Go, Java, C/C++, Ruby, Swift, Kotlin, C#
All other text files fall back to sliding-window line chunks.
~/.vecgrep/<sha256-of-project-path>/index.db
Each project gets its own isolated index. Delete the directory to wipe the index.
The embedding model used by VecGrep is all-MiniLM-L6-v2-code-search-512, a model fine-tuned specifically for semantic code search by @isuruwijesiri.
@misc{all_MiniLM_L6_v2_code_search_512,
author = {isuruwijesiri},
title = {all-MiniLM-L6-v2-code-search-512},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/isuruwijesiri/all-MiniLM-L6-v2-code-search-512}
}| ? Questions | Start a Q&A discussion |
| + Ideas | Share an idea |
| > Show & Tell | Share how you use VecGrep |
| ! Bugs | Open an issue |
