Skip to content

RamziRebai/a-Realtime-Voice-to-Voice-Agentic-RAG-Application-using-LiveKit-and-Redis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

🤖 AI Voice Assistant with PDF Analysis

A sophisticated real-time voice assistant application that enables users to upload PDF documents and have natural voice conversations about their content. Built with cutting-edge AI technologies and modern web frameworks.

AI Voice Assistant Demo TypeScript Next.js Python

Demo showcasing the real‑time Voice‑to‑Voice AI Agent built with LiveKit and Redis:

Demo.Realtime.Voice.Agentic.RAG.with.Livekit.and.Redis.mp4

✨ Features

🎯 Core Functionality

  • Real-time Voice Interaction: Natural voice conversations with AI assistant
  • PDF Document Analysis: Upload and analyze PDF documents through voice commands
  • Intelligent Q&A: Ask questions about uploaded documents and get accurate answers
  • Chat History Persistence: Conversation history saved and restored across sessions
  • Multi-user Support: Secure user authentication and personalized experiences

🎨 User Experience

  • Modern Chat Interface: Beautiful, responsive chat UI with real-time message streaming
  • Live Transcription: Real-time speech-to-text with typing indicators
  • Visual Feedback: Animated agent states (listening, thinking, speaking)
  • Drag & Drop Upload: Intuitive PDF upload with progress tracking
  • Dark/Light Mode: Adaptive theming for better user experience

🔧 Technical Features

  • RAG (Retrieval Augmented Generation): Advanced document retrieval for accurate responses
  • Vector Search: Semantic search through document content using embeddings
  • Real-time Audio Processing: Low-latency voice communication
  • Auto-scroll Chat: Smart scrolling to latest messages
  • File Validation: Secure PDF upload with size and type validation

🏗️ Architecture

graph TB
    A[Frontend - Next.js] --> B[LiveKit Room]
    B --> C[AI Agent - Python]
    C --> D[OpenAI LLM]
    C --> E[Speech Services]
    C --> F[Vector Database - Redis]
    C --> G[PDF Processing]
    H[Supabase Auth] --> A
    I[User Uploads PDF] --> B
    B --> J[File Stream Processing]
    J --> K[Document Indexing]
    K --> F
Loading

🛠️ Technologies Used

Frontend

Backend (AI Agent)

AI Services

🚀 Getting Started

Prerequisites

  • Node.js 18+ and npm/yarn
  • Python 3.9+
  • Redis instance (local or cloud)
  • API keys for OpenAI, Deepgram, and LiveKit
  1. Configure environment variables Create .env.local file in the frontend directory:
NEXT_PUBLIC_LIVEKIT_URL=wss://your-livekit-instance.livekit.cloud
NEXT_PUBLIC_CONN_DETAILS_ENDPOINT=/api/connection-details
NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-supabase-anon-key
  1. Set up the AI agent Navigate to the agent directory and install Python dependencies:
cd ../first_practice
pip install -r requirements.txt

Configure agent environment variables:

OPENAI_API_KEY=your-openai-api-key
DEEPGRAM_API_KEY=your-deepgram-api-key
LIVEKIT_API_KEY=your-livekit-api-key
LIVEKIT_API_SECRET=your-livekit-secret
LIVEKIT_URL=wss://your-livekit-instance.livekit.cloud

Running the Application

  1. Start the AI agent
cd first_practice
python agent.py
  1. Start the frontend development server
cd voice-assistant-frontend
npm run dev
  1. Access the application Open http://localhost:3000 in your browser

📱 Usage

  1. Authentication: Sign in using the authentication system
  2. Upload PDF: Drag and drop or select a PDF document
  3. Start Conversation: Click "Start Conversation" to connect to the AI agent
  4. Voice Interaction:
    • Speak naturally to ask questions about your document
    • The AI will process your speech and respond with relevant information
    • View the conversation history in the chat interface
  5. Document Analysis: Ask for summaries, specific information, or analysis of your PDF content

🎯 Key Features Showcase

Real-time Voice Processing

  • Low Latency: Sub-second response times for voice interactions
  • Natural Conversations: Context-aware responses with conversation memory
  • Live Transcription: Real-time speech-to-text with visual feedback
  • Typing Indicators: See when the AI is processing your request

PDF Document Handling

  • Drag & Drop Upload: Easy PDF upload with progress tracking
  • Document Indexing: Efficient indexing of PDF content for fast retrieval
  • Semantic Search: Ask questions in natural language and get accurate answers
  • Contextual Understanding: AI understands context and provides relevant responses

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A Voice-to-Voice AI Agent that lets you naturally talk to documents in real time. Powered by LiveKit's ultra-low-latency STT → LLM → TTS pipeline, it uses RAG for instant document insights and Redis for persistent memory—delivering a fully immersive voice-first experience.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors