Skip to content

A sophisticated Retrieval-Augmented Generation (RAG) system built with FastAPI that enables intelligent document-based question answering

Notifications You must be signed in to change notification settings

Prithvi824/LLM-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Model - Document Q&A System

A sophisticated Retrieval-Augmented Generation (RAG) system built with FastAPI that enables intelligent document-based question answering. This application combines the power of OpenAI's language models with vector search capabilities to provide contextually accurate responses based on uploaded documents.

πŸš€ Features

Core Functionality

  • Document Upload & Processing: Upload PDF documents that are automatically processed and vectorized
  • Intelligent Q&A: Ask questions about your documents and get contextually relevant answers
  • Vector Search: Advanced semantic search using Qdrant vector database
  • Chat History: Persistent conversation history with context awareness
  • User Authentication: Secure JWT-based authentication system
  • Real-time Processing: Efficient document chunking and embedding generation

Technical Highlights

  • RAG Architecture: Combines retrieval and generation for accurate responses
  • Vector Embeddings: Uses OpenAI's text-embedding-3-large model (3072 dimensions)
  • Scalable Database: PostgreSQL for relational data, Qdrant for vector storage
  • Modern API: RESTful API with automatic OpenAPI documentation
  • Production Ready: Comprehensive logging, error handling, and database migrations

πŸ“‹ Table of Contents

πŸ—οΈ Architecture

System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   FastAPI App   β”‚    β”‚   PostgreSQL    β”‚    β”‚     Qdrant      β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚   Vector DB     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Auth      β”‚  │◄──►│  β”‚   Users   β”‚  β”‚    β”‚  β”‚ Embeddingsβ”‚  β”‚
β”‚  β”‚ Routes    β”‚  β”‚    β”‚  β”‚ Documents β”‚  β”‚    β”‚  β”‚ Vectors   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚    β”‚  β”‚ Messages  β”‚  β”‚    β”‚  β”‚ Metadata  β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”‚ Chat      β”‚  β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚  β”‚ Routes    β”‚  β”‚              β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚              β”‚                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚              β”‚                       β”‚
β”‚  β”‚ Doc       β”‚  β”‚              β”‚                       β”‚
β”‚  β”‚ Routes    β”‚  β”‚              β”‚                       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚              β”‚                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚                       β”‚
         β”‚                       β”‚                       β”‚
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   OpenAI API    β”‚    β”‚   SQLAlchemy    β”‚    β”‚ Qdrant Client   β”‚
β”‚                 β”‚    β”‚      ORM        β”‚    β”‚                 β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β”‚ GPT-4.1     β”‚ β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β”‚ Embeddings  β”‚ β”‚    β”‚                 β”‚    β”‚                 β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚    β”‚                 β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

RAG Pipeline

  1. Document Ingestion: PDF files are uploaded and processed
  2. Text Extraction: Content is extracted using LlamaIndex PDFReader
  3. Chunking: Text is split into manageable chunks (1000 chars, 100 overlap)
  4. Vectorization: Chunks are converted to embeddings using OpenAI
  5. Storage: Vectors stored in Qdrant with metadata
  6. Query Processing: User questions are vectorized and matched
  7. Context Retrieval: Relevant chunks are retrieved based on similarity
  8. Response Generation: OpenAI generates answers using retrieved context

Technology Stack

  • Backend Framework: FastAPI 0.116.1+
  • Language Model: OpenAI GPT-4.1
  • Embeddings: OpenAI text-embedding-3-large (3072D)
  • Vector Database: Qdrant 1.16.1
  • Relational Database: PostgreSQL with SQLAlchemy 2.0.44
  • Authentication: JWT with python-jose 3.5.0
  • Password Hashing: Argon2 via passlib 1.7.4
  • Document Processing: LlamaIndex for PDF parsing
  • Migration Management: Alembic 1.17.2
  • Environment Management: UV package manager

πŸ› οΈ Installation

Prerequisites

  • Python: 3.13+ (specified in pyproject.toml)
  • PostgreSQL: 12+ for relational data storage
  • Qdrant: Vector database (can run via Docker)
  • OpenAI API Key: For language model and embeddings

Quick Start

  1. Clone the Repository

    git clone <repository-url>
    cd Rag-Model
  2. Set Up Python Environment

    # Using UV (recommended)
    uv venv
    uv pip install -e .
    
    # Or using pip
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -e .
  3. Install Dependencies

    # Production dependencies
    uv pip install -r pyproject.toml
    
    # Development dependencies (optional)
    uv pip install -e .[dev]
  4. Set Up Databases

    # Start Qdrant (using Docker)
    docker run -p 6333:6333 qdrant/qdrant
    
    # Ensure PostgreSQL is running
    # Create database: rag_model
  5. Configure Environment

    cp .env.example .env
    # Edit .env with your configuration
  6. Run Database Migrations

    alembic upgrade head
  7. Start the Application

    uvicorn main:app --reload --host 0.0.0.0 --port 8000

βš™οΈ Configuration

Environment Variables

Create a .env file in the project root with the following variables:

# OpenAI Configuration
OPENAPI_API_KEY=<--value goes here-->               (required)

# Model Configuration
MODEL_NAME=<--value goes here-->                    (defaults: gpt-4.1 )
EMBED_MODEL=<--value goes here-->                   (defaults: text-embedding-3-large)
EMBED_SIZE=<--value goes here-->                    (defaults: 3072)

# Database Configuration
DB_STRING=<--value goes here-->                     (required)
ECHO_SQL=<--value goes here-->                      (defaults: False)
DB_SCHEMA=<--value goes here-->                     (defaults rag_model)

# Vector Database
VECTOR_DB_URL=<--value goes here-->                 (defaults: http://localhost:6333)
VECTOR_INCLUSION_THRESHOLD=<--value goes here-->    (defaults: 0.5)

# Authentication
JWT_SECRET_KEY=<--value goes here-->                (required)
JWT_ALGORITHM=<--value goes here-->                 (defaults: HS256)

# Application Settings
FALLBACK_MESSAGE=Sorry, Could not generate a message. Please try again later.
LOG_FILE=app.log

Configuration Details

Model Settings

  • MODEL_NAME: OpenAI model for chat completions (default: gpt-4.1)
  • EMBED_MODEL: Embedding model (default: text-embedding-3-large)
  • EMBED_SIZE: Embedding dimensions (3072 for text-embedding-3-large)

Database Settings

  • DB_STRING: PostgreSQL connection string
  • VECTOR_DB_URL: Qdrant server URL
  • VECTOR_INCLUSION_THRESHOLD: Minimum similarity score for including documents (0.0-1.0)

Security Settings

  • JWT_SECRET_KEY: Secret key for JWT token signing (use a strong, random key)
  • JWT_ALGORITHM: JWT signing algorithm (HS256 recommended)

πŸ“– Usage

Starting the Application

  1. Development Mode

    uvicorn main:app --reload --host 0.0.0.0 --port 8000
  2. Production Mode

    uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
  3. Access the Application

Basic Workflow

  1. Register/Login

    # Register a new user
    curl -X POST "http://localhost:8000/auth/signup" \
         -H "Content-Type: application/json" \
         -d '{"name": "John Doe", "email": "john@example.com", "password": "securepassword123"}'
    
    # Login
    curl -X POST "http://localhost:8000/auth/login" \
         -H "Content-Type: application/json" \
         -d '{"email": "john@example.com", "password": "securepassword123"}'
  2. Upload Documents

    curl -X POST "http://localhost:8000/docs/upload-pdf-document" \
         -H "Authorization: Bearer YOUR_JWT_TOKEN" \
         -F "file=@document.pdf"
  3. Ask Questions

    curl -X POST "http://localhost:8000/chats/send-message" \
         -H "Authorization: Bearer YOUR_JWT_TOKEN" \
         -H "Content-Type: application/json" \
         -d '{"message": "What is the main topic of the document?"}'
  4. View Chat History

    curl -X GET "http://localhost:8000/chats/messages" \
         -H "Authorization: Bearer YOUR_JWT_TOKEN"

πŸ“š API Documentation

Authentication Endpoints

POST /auth/signup

Register a new user account.

Request Body:

{
  "name": "John Doe",
  "email": "john@example.com",
  "password": "securepassword123"
}

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
  "token_type": "bearer",
  "user_id": 1,
  "email": "john@example.com",
  "name": "John Doe"
}

POST /auth/login

Authenticate user and receive JWT token.

Request Body:

{
  "email": "john@example.com",
  "password": "securepassword123"
}

Response:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
  "token_type": "bearer",
  "user_id": 1,
  "email": "john@example.com",
  "name": "John Doe"
}

GET /auth/me

Get current authenticated user information.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "user_id": 1,
  "name": "John Doe",
  "email": "john@example.com",
  "is_active": true,
  "is_verified": false,
  "created_at": "2024-12-09T08:25:00Z"
}

Document Management Endpoints

POST /docs/upload-pdf-document

Upload and process a PDF document.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: multipart/form-data

Request Body:

file: (PDF file)

Response:

{
  "message": "Document 'example.pdf' uploaded and processed successfully",
  "doc_uuid": "550e8400-e29b-41d4-a716-446655440000",
  "status": "success"
}

GET /docs/list-documents

List all uploaded documents for the current user.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "documents": [
    {
      "doc_uuid": "550e8400-e29b-41d4-a716-446655440000",
      "file_url": "Not Available",
      "file_size": 1048576,
      "original_filename": "example.pdf",
      "mime_type": "application/pdf",
      "created_at": "2024-12-09T08:25:00Z",
      "total_chunks": 42
    }
  ],
  "count": 1,
  "status": "success"
}

DELETE /docs/delete-document/{doc_uuid}

Delete a document and its associated vectors.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "message": "Document 'example.pdf' deleted successfully",
  "status": "success"
}

Chat Endpoints

POST /chats/send-message

Send a message and get an AI response based on uploaded documents.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: application/json

Request Body:

{
  "message": "What are the main points discussed in the document?",
  "message_history_count": 20
}

Response:

{
  "status": "success",
  "total_input_vectors": 1536,
  "total_query_hits": 5,
  "total_output_tokens": 150,
  "query_hit_doc_uuids": ["550e8400-e29b-41d4-a716-446655440000"],
  "model_response": "Based on the uploaded document, the main points discussed are..."
}

GET /chats/messages

Retrieve chat history for the current user.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "status": "success",
  "messages": [
    {
      "role": "user",
      "content": "What are the main points?",
      "model_used": "gpt-4.1",
      "tokens_used": 1536,
      "response_time_ms": 0,
      "ai_prompt": null,
      "context_document_uuid": null,
      "created_at": "2024-12-09T08:25:00Z"
    },
    {
      "role": "assistant",
      "content": "Based on the document...",
      "model_used": "gpt-4.1",
      "tokens_used": 150,
      "response_time_ms": 2500,
      "ai_prompt": "System: You are a helpful assistant...",
      "context_document_uuid": ["550e8400-e29b-41d4-a716-446655440000"],
      "created_at": "2024-12-09T08:25:02Z"
    }
  ]
}

DELETE /chats/clear-history

Clear all chat history for the current user.

Headers:

Authorization: Bearer YOUR_JWT_TOKEN

Response:

{
  "status": "success"
}

Health Check

GET /health

Check application health status.

Response:

{
  "status": "healthy",
  "message": "RAG API is running."
}

πŸ“ Project Structure

Rag-Model/
β”œβ”€β”€ πŸ“ alembic/                    # Database migrations
β”‚   β”œβ”€β”€ πŸ“ versions/               # Migration files
β”‚   β”‚   └── c938cf66d0f6_initial_setup.py
β”‚   └── env.py                     # Alembic environment configuration
β”œβ”€β”€ πŸ“ authentication/             # Authentication module
β”‚   β”œβ”€β”€ __init__.py               # Module exports
β”‚   β”œβ”€β”€ auth_models.py            # Authentication data models
β”‚   └── utils.py                  # JWT and password utilities
β”œβ”€β”€ πŸ“ database/                   # Database layer
β”‚   β”œβ”€β”€ models.py                 # SQLAlchemy models
β”‚   β”œβ”€β”€ postgres_db.py            # PostgreSQL connection
β”‚   └── vector_db.py              # Qdrant vector database client
β”œβ”€β”€ πŸ“ llm/                       # Language model integration
β”‚   β”œβ”€β”€ models.py                 # LLM data models
β”‚   └── openai_client.py          # OpenAI API client
β”œβ”€β”€ πŸ“ log_config/                # Logging configuration
β”‚   β”œβ”€β”€ __init__.py               # Logger factory
β”‚   └── logging_config.py         # Logging setup
β”œβ”€β”€ πŸ“ route_models/              # API request/response models
β”‚   β”œβ”€β”€ auth_models.py            # Authentication models
β”‚   β”œβ”€β”€ chat_models.py            # Chat endpoint models
β”‚   └── doc_models.py             # Document endpoint models
β”œβ”€β”€ πŸ“ routers/                   # FastAPI route handlers
β”‚   β”œβ”€β”€ __init__.py               # Router exports
β”‚   β”œβ”€β”€ auth_routes.py            # Authentication endpoints
β”‚   β”œβ”€β”€ chat_routes.py            # Chat endpoints
β”‚   └── doc_routes.py             # Document endpoints
β”œβ”€β”€ πŸ“ utilities/                 # Utility functions
β”‚   β”œβ”€β”€ __init__.py               # Utility exports
β”‚   └── utility.py                # PDF processing and vectorization
β”œβ”€β”€ πŸ“ logs/                      # Application logs (auto-created)
β”œβ”€β”€ πŸ“ vector_db_storage/         # Qdrant data storage (auto-created)
β”œβ”€β”€ .env                          # Environment variables
β”œβ”€β”€ .gitignore                    # Git ignore rules
β”œβ”€β”€ .python-version               # Python version specification
β”œβ”€β”€ alembic.ini                   # Alembic configuration
β”œβ”€β”€ main.py                       # FastAPI application entry point
β”œβ”€β”€ pyproject.toml                # Project dependencies and metadata
β”œβ”€β”€ settings.py                   # Application configuration
β”œβ”€β”€ uv.lock                       # UV lock file
└── README.md                     # This file

Key Components

Core Application (main.py)

  • FastAPI application setup with CORS middleware
  • Application lifespan management
  • Database connection initialization
  • Router registration and API documentation

Authentication System (authentication/)

  • JWT Token Management: Secure token creation and validation
  • Password Security: Argon2 hashing for password storage
  • Role-Based Access: User role authorization system
  • Dependency Injection: FastAPI dependencies for route protection

Database Layer (database/)

  • PostgreSQL Models: User, Document, and Message entities
  • Vector Database: Qdrant integration for embeddings storage
  • Connection Management: Session handling and connection pooling
  • Migration Support: Alembic for database schema management

Language Model Integration (llm/)

  • OpenAI Client: GPT-4.1 and embedding model integration
  • RAG Pipeline: Document processing and context retrieval
  • Response Generation: Contextual answer generation
  • Token Management: Usage tracking and optimization

API Routes (routers/)

  • Authentication Routes: Login, signup, user management
  • Document Routes: Upload, list, delete PDF documents
  • Chat Routes: Message sending, history management
  • Error Handling: Comprehensive exception management

Utilities (utilities/)

  • PDF Processing: Document parsing and text extraction
  • Vectorization: Text-to-embedding conversion
  • Chunking: Intelligent text segmentation

Qdrant Vector Database

Collection: "rag"

  • Vector Dimension: 3072 (OpenAI text-embedding-3-large)
  • Distance Metric: Cosine similarity
  • Payload Schema:
    {
      "source": "document_filename.pdf",
      "text": "chunk_content_text",
      "uuid": "document_uuid"
    }

Relationships

  • Users β†’ Documents: One-to-many (cascade delete)
  • Users β†’ Messages: One-to-many (cascade delete)
  • Documents β†’ Vector Embeddings: One-to-many (via UUID)

πŸ”§ Development

Setting Up Development Environment

  1. Install Development Dependencies

    uv pip install -e .[dev]
  2. Code Formatting

    black .

Development Tools

Available Dependencies

  • black: Code formatting (25.12.0+)
  • icecream: Enhanced debugging (2.1.8+)

Database Migrations

Create a New Migration

alembic revision --autogenerate -m "Description of changes"

Apply Migrations

alembic upgrade head

Rollback Migration

alembic downgrade -1

Logging

The application uses a comprehensive logging system:

  • File Logging: Rotating logs in logs/app.log
  • Console Logging: Real-time output during development
  • Log Levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
  • Structured Format: Timestamp, module, level, file:line, message

Code Style Guidelines

  1. Import Organization

    # 1st party imports
    import os
    from typing import List
    
    # 3rd party imports
    from fastapi import FastAPI
    from sqlalchemy import create_engine
    
    # local imports
    from settings import project_settings
    from database.models import User
  2. Function Documentation

    def example_function(param1: str, param2: int = 10) -> bool:
        """
        Brief description of the function.
        
        Args:
            param1 (str): Description of param1.
            param2 (int): Description of param2. Defaults to 10.
        
        Returns:
            bool: Description of return value.
        """
  3. Error Handling

    try:
        # operation
        result = perform_operation()
        logger.info("Operation successful")
        return result
    except SpecificException as e:
        logger.error(f"Specific error: {str(e)}")
        raise HTTPException(status_code=400, detail="Specific error message")
    except Exception as e:
        logger.exception(f"Unexpected error: {str(e)}")
        raise HTTPException(status_code=500, detail="Internal server error")

🀝 Contributing

Getting Started

  1. Fork the Repository

  2. Create a Feature Branch

    git checkout -b feature/your-feature-name
  3. Make Changes

    • Follow code style guidelines
    • Add tests for new functionality
    • Update documentation as needed
  4. Test Your Changes

    # Format code
    black .
    
    # Run tests (when available)
    pytest
    
    # Test API endpoints
    curl -X GET "http://localhost:8000/health"
  5. Submit a Pull Request

    • Provide clear description of changes
    • Reference any related issues
    • Ensure all checks pass

Development Guidelines

  1. Code Quality

    • Follow PEP 8 style guidelines
    • Use type hints consistently
    • Write comprehensive docstrings
    • Handle errors gracefully
  2. Testing

    • Write unit tests for new functions
    • Test API endpoints thoroughly
    • Verify database operations
    • Test error conditions
  3. Documentation

    • Update README for new features
    • Document API changes
    • Add inline code comments
    • Update configuration examples

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI: For providing powerful language models and embeddings
  • Qdrant: For the excellent vector database solution
  • FastAPI: For the modern, fast web framework
  • LlamaIndex: For document processing capabilities
  • SQLAlchemy: For robust database ORM
  • Contributors: All developers who have contributed to this project

Built with ❀️ using FastAPI, OpenAI, and Qdrant

About

A sophisticated Retrieval-Augmented Generation (RAG) system built with FastAPI that enables intelligent document-based question answering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published