A production-ready Retrieval-Augmented Generation system with Knowledge Graph capabilities
Features • Architecture • Installation • API Reference • Configuration
Advance RAG is an enterprise-grade RAG system built on HippoRAG, designed for high-accuracy question answering in multilingual environments. It combines knowledge graph-based retrieval, hybrid search strategies, and advanced query processing to deliver precise, source-cited responses.
Key Differentiators:
- Neurobiologically-inspired memory architecture for complex reasoning
- Native support for Bengali, English, and Banglish queries
- Multi-stage retrieval with cross-encoder reranking
- Intelligent query decomposition for multi-entity questions
| Capability | Description |
|---|---|
| Knowledge Graph Retrieval | Graph-based context understanding using entity relationships and semantic connections |
| Hybrid Search | Combines BM25 lexical matching with dense vector retrieval for comprehensive coverage |
| Cross-Encoder Reranking | Neural reranking using BAAI/bge-reranker-v2-m3 for precision optimization |
| Grounded QA | Source-cited responses with built-in hallucination prevention mechanisms |
| Feature | Description |
|---|---|
| Query Clarity Detection | Automatic detection of ambiguous or unclear queries |
| Query Rewriting | GPT-4o-mini powered query reformulation for improved retrieval |
| Context-Aware Expansion | Automatic query expansion with domain-relevant keywords |
| Multi-Entity Decomposition | Intelligent splitting of complex queries for parallel retrieval |
| University Chunk Tagging | Source-aware document tagging for accurate institutional filtering |
| Post-Retrieval Filtering | Entity-based filtering ensures results match the queried institution |
| Feature | Description |
|---|---|
| Answer Verification | Automated validation against source documents |
| Contextual Fallbacks | Intelligent not-found responses with relevant resource links |
| Multilingual Output | Native support for Bengali and English responses |
| Component | Model | Deployment |
|---|---|---|
| Query Processing | GPT-4o-mini | OpenAI API |
| Embeddings | multilingual-e5-large | Local (GPU) |
| Reranking | BAAI/bge-reranker-v2-m3 | Local (CPU) |
| Answer Generation | Qwen3-80B | Ollama (configurable) |
| Type | Options |
|---|---|
| LLM Providers | OpenAI, Google Gemini, Ollama, vLLM |
| Embedding Models | multilingual-e5-large, NV-Embed-v2, GritLM, Gemini Embeddings, OpenAI Embeddings |
- Python 3.10+
- CUDA-compatible GPU (recommended for embeddings)
- 16GB+ RAM
# Create environment
conda create -n hipporag python=3.10
conda activate hipporag
# Install package
pip install hipporag# Required API keys
export OPENAI_API_KEY="your-openai-key"
export GOOGLE_API_KEY="your-google-key" # Optional: For Gemini
export HF_HOME="/path/to/huggingface/cache" # Optional: Custom cache pathfrom hipporag import HippoRAG
# Initialize
rag = HippoRAG(
save_dir='outputs',
llm_model_name='gpt-4o-mini',
embedding_model_name='intfloat/multilingual-e5-large'
)
# Index documents
documents = [
"Einstein developed the theory of relativity.",
"The theory revolutionized modern physics.",
"Einstein was born in Germany in 1879."
]
rag.index(docs=documents)
# Query
results = rag.rag_qa(queries=["Where was Einstein born?"])# Start server
python api_server.py
# Index documents
curl -X POST "http://localhost:8000/index-folder" \
-H "Content-Type: application/json" \
-d '{"folder_path": "documents"}'
# Query
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "JnU B unit exam kobe?"}'| Method | Endpoint | Description |
|---|---|---|
POST |
/index |
Index document array |
POST |
/index-folder |
Index documents from directory |
POST |
/ask |
Submit question with full pipeline processing |
POST |
/debug-retrieval |
Retrieve passages without answer generation |
GET |
/health |
Service health check |
┌─────────────────────────────────────────────────────────────────┐
│ REQUEST PROCESSING │
├─────────────────────────────────────────────────────────────────┤
│ 1. Query Clarity Check → Detect ambiguous queries │
│ 2. Query Rewrite → GPT-4o-mini reformulation │
│ 3. Entity Detection → Identify institutions/entities │
│ 4. Query Expansion → Add domain keywords │
│ 5. Multi-Entity Split → Decompose complex queries │
│ 6. Hybrid Retrieval → BM25 + Dense + KG traversal │
│ 7. Cross-Encoder Rerank → Precision optimization │
│ 8. Answer Generation → Grounded response with citations │
│ 9. Fallback Handling → Contextual not-found responses │
└─────────────────────────────────────────────────────────────────┘
When information is unavailable, the system provides category-specific guidance:
| Query Category | Fallback Resource |
|---|---|
| Internal platform queries | Platform helpdesk/website |
| Medical/Dental admission | DGHS official portal |
| Engineering universities | Individual institution websites |
| General university queries | Respective university portals |
| Cluster admission | GST Admission portal |
The API server supports easy switching between different LLM providers for answer generation. Edit api_server.py line 28:
ANSWER_MODEL = "qwen3-80b" # Change this to switch modelsAvailable Presets:
| Preset | Model | Description |
|---|---|---|
gpt-4o-mini |
OpenAI GPT-4o-mini | Fast, cheap, good for testing |
gpt-4o |
OpenAI GPT-4o | Slower, expensive, better quality |
qwen3-80b |
Qwen3-next 80B (Ollama) | Local, free, 32K context |
Multi-Model Architecture:
- NER/Triple Extraction: GPT-4o (OpenAI) - accurate entity extraction
- Answer Generation: Configurable via
ANSWER_MODEL - Embeddings: multilingual-e5-large (local GPU)
- Reranking: bge-reranker-v2-m3 (local CPU)
| Parameter | Type | Default | Description |
|---|---|---|---|
save_dir |
str | required | Directory for indexes and graphs |
llm_model_name |
str | required | LLM model identifier |
embedding_model_name |
str | required | Embedding model identifier |
llm_base_url |
str | None | Custom LLM endpoint |
embedding_base_url |
str | None | Custom embedding endpoint |
| Parameter | Description |
|---|---|
MIN_REFERENCE_SCORE |
Minimum score threshold for references (default: 0.4) |
hybrid_alpha |
Balance between dense and sparse retrieval (0-1) |
advance-rag/
├── src/hipporag/
│ ├── HippoRAG.py # Core RAG implementation
│ ├── embedding_model/ # Embedding backends
│ ├── llm/ # LLM provider integrations
│ ├── prompts/templates/ # Prompt engineering
│ └── retrieval/ # BM25, rerankers, hybrid search
├── api_server.py # FastAPI REST server
├── visualize_kg_web.py # Knowledge graph visualization
└── documents/ # Document store
Launch the interactive knowledge graph explorer:
python visualize_kg_web.pyFeatures:
- Entity node visualization with relationship edges
- Query path highlighting
- Interactive graph exploration
- Document-to-entity mapping
- Indexing: ~2-5 minutes per 100 documents (GPU recommended)
- Query latency: 30-120 seconds depending on complexity
- Memory: 8GB minimum, 16GB+ recommended for large indexes
This project is intended for educational and research purposes.
Built upon HippoRAG by OSU NLP Group
Neurobiologically Inspired Long-Term Memory for Large Language Models


