🤖 AI Voice Assistant with PDF Analysis

A sophisticated real-time voice assistant application that enables users to upload PDF documents and have natural voice conversations about their content. Built with cutting-edge AI technologies and modern web frameworks.

Demo showcasing the real‑time Voice‑to‑Voice AI Agent built with LiveKit and Redis:

Demo.Realtime.Voice.Agentic.RAG.with.Livekit.and.Redis.mp4

✨ Features

🎯 Core Functionality

Real-time Voice Interaction: Natural voice conversations with AI assistant
PDF Document Analysis: Upload and analyze PDF documents through voice commands
Intelligent Q&A: Ask questions about uploaded documents and get accurate answers
Chat History Persistence: Conversation history saved and restored across sessions
Multi-user Support: Secure user authentication and personalized experiences

🎨 User Experience

Modern Chat Interface: Beautiful, responsive chat UI with real-time message streaming
Live Transcription: Real-time speech-to-text with typing indicators
Visual Feedback: Animated agent states (listening, thinking, speaking)
Drag & Drop Upload: Intuitive PDF upload with progress tracking
Dark/Light Mode: Adaptive theming for better user experience

🔧 Technical Features

RAG (Retrieval Augmented Generation): Advanced document retrieval for accurate responses
Vector Search: Semantic search through document content using embeddings
Real-time Audio Processing: Low-latency voice communication
Auto-scroll Chat: Smart scrolling to latest messages
File Validation: Secure PDF upload with size and type validation

🏗️ Architecture

graph TB
    A[Frontend - Next.js] --> B[LiveKit Room]
    B --> C[AI Agent - Python]
    C --> D[OpenAI LLM]
    C --> E[Speech Services]
    C --> F[Vector Database - Redis]
    C --> G[PDF Processing]
    H[Supabase Auth] --> A
    I[User Uploads PDF] --> B
    B --> J[File Stream Processing]
    J --> K[Document Indexing]
    K --> F

🛠️ Technologies Used

Frontend

Next.js 14 - React framework with App Router
TypeScript - Type-safe development
Tailwind CSS - Utility-first CSS framework
Framer Motion - Animation library
LiveKit Components - Real-time communication
Supabase - Authentication and user management

Backend (AI Agent)

Python 3.9+ - Core runtime
LiveKit Agents - Real-time AI agent framework
OpenAI GPT-4 - Large Language Model
LangChain - LLM application framework
Redis Vector Store - Vector database for embeddings
PyMuPDF4LLM - PDF processing

AI Services

OpenAI Embeddings - Text embeddings
Deepgram STT - Speech-to-Text
OpenAI TTS - Text-to-Speech
Silero VAD - Voice Activity Detection

🚀 Getting Started

Prerequisites

Node.js 18+ and npm/yarn
Python 3.9+
Redis instance (local or cloud)
API keys for OpenAI, Deepgram, and LiveKit

Configure environment variables Create .env.local file in the frontend directory:

NEXT_PUBLIC_LIVEKIT_URL=wss://your-livekit-instance.livekit.cloud
NEXT_PUBLIC_CONN_DETAILS_ENDPOINT=/api/connection-details
NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-supabase-anon-key

Set up the AI agent Navigate to the agent directory and install Python dependencies:

cd ../first_practice
pip install -r requirements.txt

Configure agent environment variables:

OPENAI_API_KEY=your-openai-api-key
DEEPGRAM_API_KEY=your-deepgram-api-key
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-secret
LIVEKIT_URL=wss://your-livekit-instance.livekit.cloud

Running the Application

Start the AI agent

cd first_practice
python agent.py

Start the frontend development server

cd voice-assistant-frontend
npm run dev

Access the application Open http://localhost:3000 in your browser

📱 Usage

Authentication: Sign in using the authentication system
Upload PDF: Drag and drop or select a PDF document
Start Conversation: Click "Start Conversation" to connect to the AI agent
Voice Interaction:
- Speak naturally to ask questions about your document
- The AI will process your speech and respond with relevant information
- View the conversation history in the chat interface
Document Analysis: Ask for summaries, specific information, or analysis of your PDF content

🎯 Key Features Showcase

Real-time Voice Processing

Low Latency: Sub-second response times for voice interactions
Natural Conversations: Context-aware responses with conversation memory
Live Transcription: Real-time speech-to-text with visual feedback
Typing Indicators: See when the AI is processing your request

PDF Document Handling

Drag & Drop Upload: Easy PDF upload with progress tracking
Document Indexing: Efficient indexing of PDF content for fast retrieval
Semantic Search: Ask questions in natural language and get accurate answers
Contextual Understanding: AI understands context and provides relevant responses

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 AI Voice Assistant with PDF Analysis

Demo showcasing the real‑time Voice‑to‑Voice AI Agent built with LiveKit and Redis:

✨ Features

🎯 Core Functionality

🎨 User Experience

🔧 Technical Features

🏗️ Architecture

🛠️ Technologies Used

Frontend

Backend (AI Agent)

AI Services

🚀 Getting Started

Prerequisites

Running the Application

📱 Usage

🎯 Key Features Showcase

Real-time Voice Processing

PDF Document Handling

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Voice Assistant with PDF Analysis

Demo showcasing the real‑time Voice‑to‑Voice AI Agent built with LiveKit and Redis:

✨ Features

🎯 Core Functionality

🎨 User Experience

🔧 Technical Features

🏗️ Architecture

🛠️ Technologies Used

Frontend

Backend (AI Agent)

AI Services

🚀 Getting Started

Prerequisites

Running the Application

📱 Usage

🎯 Key Features Showcase

Real-time Voice Processing

PDF Document Handling

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages