AI backend platform built with FastAPI, featuring RAG pipelines, vector search, streaming AI responses, and scalable async architecture.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI App β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Routes β β Auth β β Health β β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββββββββ β
β β β β
β ββββββΌβββββββββββββββΌβββββ β
β β Service Layer β β
β β AuthService β AIServiceβ β
β β FileService β SubService β
β ββββββ¬ββββββββββββββ¬ββββββ β
β β β β
β ββββββΌβββββ ββββββΌβββββββββββββββββββ β
β β Repos β β AI Pipeline β β
β β (DB) β β Embed β Index β RAG β β
β ββββββ¬βββββ ββββββ¬βββββββββββββββββββ β
β β β β
β ββββββΌβββββ ββββββΌβββββββββ βββββββββββββββββ β
β βPostgreSQLβ β Qdrant β β OpenAI API β β
β βββββββββββ βββββββββββββββ βββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββΌβββββββββββββββββββββββ
β Redis + Celery Workers β
β Embedding β Indexing β GC β
ββββββββββββββββββββββββββββββ
| Layer | Technology |
|---|---|
| Framework | FastAPI + Uvicorn |
| Database | PostgreSQL 16 + SQLAlchemy 2.0 Async |
| Cache / Broker | Redis 7 |
| Vector DB | Qdrant |
| AI Provider | OpenAI (GPT-4o + text-embedding-3-small) |
| Background Tasks | Celery |
| Auth | JWT (access + refresh tokens) |
| Validation | Pydantic v2 |
| Migrations | Alembic |
| Testing | Pytest + pytest-asyncio + HTTPX |
| Containerization | Docker + Docker Compose |
- JWT access tokens (30-minute expiry)
- Refresh token rotation with secure hashing
- bcrypt password hashing
- Role-based access control (user / admin)
- Document Chat β RAG-powered Q&A over uploaded documents
- Resume Analyzer β Structured resume feedback with optional job description matching
- Code Review β Security, performance, and quality analysis
- Meeting Summarizer β Transcript summarization with action items
- Streaming Responses β Real-time SSE token streaming for all AI endpoints
Upload β Validate β Store β Queue Celery Task
β Extract Text (PDF/DOCX/TXT/Code)
β Chunk Text (configurable window + overlap)
β Generate Embeddings (OpenAI batch)
β Index to Qdrant
β Update File Status β Done
- Semantic similarity search via Qdrant cosine distance
- Per-user + per-file metadata filtering
- Configurable top-k retrieval with score threshold
- Context injection into structured prompts
- Free / Pro / Enterprise tiers
- Per-month request and token quotas
- Quota enforcement via dependency injection
- Stripe-ready schema (customer_id, subscription_id columns)
embeddingsqueue β document processing & indexingindexingqueue β vector operationscleanupqueue β expired token purge, orphaned file cleanup
app/
βββ api/v1/ # Thin route handlers
β βββ auth.py
β βββ ai.py
β βββ files.py
β βββ stream.py
β βββ subscriptions.py
βββ ai/ # OpenAI integration & RAG pipeline
β βββ client.py
β βββ completions.py
β βββ embeddings.py
β βββ pipeline.py
βββ core/ # Security, exceptions, logging
βββ db/ # SQLAlchemy engine & session
βββ models/ # ORM models (7 tables)
βββ schemas/ # Pydantic v2 request/response schemas
βββ repositories/ # Data access layer (no business logic)
βββ services/ # Business logic layer
βββ tasks/ # Celery workers
βββ vector/ # Qdrant client, indexer, retriever
βββ streaming/ # SSE helpers
βββ middleware/ # Request logging, rate limiting
βββ dependencies/ # FastAPI DI (auth, db, quota)
βββ utils/ # File extraction, text chunking
βββ tests/ # pytest async test suite
cp .env.example .env
# Edit .env with your OPENAI_API_KEY and SECRET_KEY
docker compose up --buildThe API will be available at http://localhost:8000.
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Fill in DATABASE_URL, REDIS_URL, OPENAI_API_KEY, SECRET_KEY
alembic upgrade head
uvicorn app.main:app --reloadPOST /api/v1/auth/register Register new user
POST /api/v1/auth/login Obtain access + refresh tokens
POST /api/v1/auth/refresh Rotate refresh token
POST /api/v1/files/upload Upload document (PDF/DOCX/TXT/code)
GET /api/v1/files List user's files
DELETE /api/v1/files/{id} Delete file + vectors
POST /api/v1/ai/chat General AI chat
POST /api/v1/ai/document-chat RAG chat over uploaded document
POST /api/v1/ai/resume-analyze Resume analysis
POST /api/v1/ai/code-review Code review
POST /api/v1/ai/meeting-summary Meeting transcript summarizer
POST /api/v1/stream/chat Streaming general chat
POST /api/v1/stream/document-chat Streaming RAG document chat
SSE events: token, done, error
GET /api/v1/subscriptions/plans Available plans
GET /api/v1/subscriptions/me Current subscription
POST /api/v1/subscriptions/upgrade Upgrade tier
GET /api/v1/subscriptions/usage Current period usage
GET /health Service health check
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL async URL | required |
REDIS_URL |
Redis URL | required |
OPENAI_API_KEY |
OpenAI secret key | required |
SECRET_KEY |
JWT signing secret | required |
QDRANT_URL |
Qdrant HTTP URL | http://localhost:6333 |
OPENAI_MODEL |
Chat model | gpt-4o |
OPENAI_EMBEDDING_MODEL |
Embedding model | text-embedding-3-small |
ACCESS_TOKEN_EXPIRE_MINUTES |
Access token TTL | 30 |
REFRESH_TOKEN_EXPIRE_DAYS |
Refresh token TTL | 7 |
MAX_FILE_SIZE |
Max upload bytes | 10485760 (10MB) |
CHUNK_SIZE |
Embedding chunk word count | 512 |
CHUNK_OVERLAP |
Chunk overlap words | 50 |
# Requires a test PostgreSQL database: ai_saas_test
pytest -v# Generate migration after model changes
alembic revision --autogenerate -m "description"
# Apply migrations
alembic upgrade head
# Roll back one
alembic downgrade -1const response = await fetch('/api/v1/stream/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`
},
body: JSON.stringify({ message: 'Explain async/await in Python' })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split('\n');
for (const line of lines) {
if (line.startsWith('data:')) {
const data = JSON.parse(line.slice(5));
if (data.token) process.stdout.write(data.token);
}
}
}