A sophisticated Retrieval-Augmented Generation (RAG) system built with FastAPI that enables intelligent document-based question answering. This application combines the power of OpenAI's language models with vector search capabilities to provide contextually accurate responses based on uploaded documents.
- Document Upload & Processing: Upload PDF documents that are automatically processed and vectorized
- Intelligent Q&A: Ask questions about your documents and get contextually relevant answers
- Vector Search: Advanced semantic search using Qdrant vector database
- Chat History: Persistent conversation history with context awareness
- User Authentication: Secure JWT-based authentication system
- Real-time Processing: Efficient document chunking and embedding generation
- RAG Architecture: Combines retrieval and generation for accurate responses
- Vector Embeddings: Uses OpenAI's
text-embedding-3-largemodel (3072 dimensions) - Scalable Database: PostgreSQL for relational data, Qdrant for vector storage
- Modern API: RESTful API with automatic OpenAPI documentation
- Production Ready: Comprehensive logging, error handling, and database migrations
- Architecture
- Installation
- Configuration
- Usage
- API Documentation
- Project Structure
- Database Schema
- Development
- Deployment
- Contributing
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β FastAPI App β β PostgreSQL β β Qdrant β
β β β β β Vector DB β
β βββββββββββββ β β βββββββββββββ β β βββββββββββββ β
β β Auth β βββββΊβ β Users β β β β Embeddingsβ β
β β Routes β β β β Documents β β β β Vectors β β
β βββββββββββββ β β β Messages β β β β Metadata β β
β βββββββββββββ β β βββββββββββββ β β βββββββββββββ β
β β Chat β β βββββββββββββββββββ βββββββββββββββββββ
β β Routes β β β β
β βββββββββββββ β β β
β βββββββββββββ β β β
β β Doc β β β β
β β Routes β β β β
β βββββββββββββ β β β
βββββββββββββββββββ β β
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β OpenAI API β β SQLAlchemy β β Qdrant Client β
β β β ORM β β β
β βββββββββββββββ β β β β β
β β GPT-4.1 β β β β β β
β β Embeddings β β β β β β
β βββββββββββββββ β β β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Document Ingestion: PDF files are uploaded and processed
- Text Extraction: Content is extracted using LlamaIndex PDFReader
- Chunking: Text is split into manageable chunks (1000 chars, 100 overlap)
- Vectorization: Chunks are converted to embeddings using OpenAI
- Storage: Vectors stored in Qdrant with metadata
- Query Processing: User questions are vectorized and matched
- Context Retrieval: Relevant chunks are retrieved based on similarity
- Response Generation: OpenAI generates answers using retrieved context
- Backend Framework: FastAPI 0.116.1+
- Language Model: OpenAI GPT-4.1
- Embeddings: OpenAI text-embedding-3-large (3072D)
- Vector Database: Qdrant 1.16.1
- Relational Database: PostgreSQL with SQLAlchemy 2.0.44
- Authentication: JWT with python-jose 3.5.0
- Password Hashing: Argon2 via passlib 1.7.4
- Document Processing: LlamaIndex for PDF parsing
- Migration Management: Alembic 1.17.2
- Environment Management: UV package manager
- Python: 3.13+ (specified in pyproject.toml)
- PostgreSQL: 12+ for relational data storage
- Qdrant: Vector database (can run via Docker)
- OpenAI API Key: For language model and embeddings
-
Clone the Repository
git clone <repository-url> cd Rag-Model
-
Set Up Python Environment
# Using UV (recommended) uv venv uv pip install -e . # Or using pip python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -e .
-
Install Dependencies
# Production dependencies uv pip install -r pyproject.toml # Development dependencies (optional) uv pip install -e .[dev]
-
Set Up Databases
# Start Qdrant (using Docker) docker run -p 6333:6333 qdrant/qdrant # Ensure PostgreSQL is running # Create database: rag_model
-
Configure Environment
cp .env.example .env # Edit .env with your configuration -
Run Database Migrations
alembic upgrade head
-
Start the Application
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Create a .env file in the project root with the following variables:
# OpenAI Configuration
OPENAPI_API_KEY=<--value goes here--> (required)
# Model Configuration
MODEL_NAME=<--value goes here--> (defaults: gpt-4.1 )
EMBED_MODEL=<--value goes here--> (defaults: text-embedding-3-large)
EMBED_SIZE=<--value goes here--> (defaults: 3072)
# Database Configuration
DB_STRING=<--value goes here--> (required)
ECHO_SQL=<--value goes here--> (defaults: False)
DB_SCHEMA=<--value goes here--> (defaults rag_model)
# Vector Database
VECTOR_DB_URL=<--value goes here--> (defaults: http://localhost:6333)
VECTOR_INCLUSION_THRESHOLD=<--value goes here--> (defaults: 0.5)
# Authentication
JWT_SECRET_KEY=<--value goes here--> (required)
JWT_ALGORITHM=<--value goes here--> (defaults: HS256)
# Application Settings
FALLBACK_MESSAGE=Sorry, Could not generate a message. Please try again later.
LOG_FILE=app.log- MODEL_NAME: OpenAI model for chat completions (default: gpt-4.1)
- EMBED_MODEL: Embedding model (default: text-embedding-3-large)
- EMBED_SIZE: Embedding dimensions (3072 for text-embedding-3-large)
- DB_STRING: PostgreSQL connection string
- VECTOR_DB_URL: Qdrant server URL
- VECTOR_INCLUSION_THRESHOLD: Minimum similarity score for including documents (0.0-1.0)
- JWT_SECRET_KEY: Secret key for JWT token signing (use a strong, random key)
- JWT_ALGORITHM: JWT signing algorithm (HS256 recommended)
-
Development Mode
uvicorn main:app --reload --host 0.0.0.0 --port 8000
-
Production Mode
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
-
Access the Application
- API Documentation: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
-
Register/Login
# Register a new user curl -X POST "http://localhost:8000/auth/signup" \ -H "Content-Type: application/json" \ -d '{"name": "John Doe", "email": "john@example.com", "password": "securepassword123"}' # Login curl -X POST "http://localhost:8000/auth/login" \ -H "Content-Type: application/json" \ -d '{"email": "john@example.com", "password": "securepassword123"}'
-
Upload Documents
curl -X POST "http://localhost:8000/docs/upload-pdf-document" \ -H "Authorization: Bearer YOUR_JWT_TOKEN" \ -F "file=@document.pdf"
-
Ask Questions
curl -X POST "http://localhost:8000/chats/send-message" \ -H "Authorization: Bearer YOUR_JWT_TOKEN" \ -H "Content-Type: application/json" \ -d '{"message": "What is the main topic of the document?"}'
-
View Chat History
curl -X GET "http://localhost:8000/chats/messages" \ -H "Authorization: Bearer YOUR_JWT_TOKEN"
Register a new user account.
Request Body:
{
"name": "John Doe",
"email": "john@example.com",
"password": "securepassword123"
}Response:
{
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
"token_type": "bearer",
"user_id": 1,
"email": "john@example.com",
"name": "John Doe"
}Authenticate user and receive JWT token.
Request Body:
{
"email": "john@example.com",
"password": "securepassword123"
}Response:
{
"access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
"token_type": "bearer",
"user_id": 1,
"email": "john@example.com",
"name": "John Doe"
}Get current authenticated user information.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Response:
{
"user_id": 1,
"name": "John Doe",
"email": "john@example.com",
"is_active": true,
"is_verified": false,
"created_at": "2024-12-09T08:25:00Z"
}Upload and process a PDF document.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: multipart/form-data
Request Body:
file: (PDF file)
Response:
{
"message": "Document 'example.pdf' uploaded and processed successfully",
"doc_uuid": "550e8400-e29b-41d4-a716-446655440000",
"status": "success"
}List all uploaded documents for the current user.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Response:
{
"documents": [
{
"doc_uuid": "550e8400-e29b-41d4-a716-446655440000",
"file_url": "Not Available",
"file_size": 1048576,
"original_filename": "example.pdf",
"mime_type": "application/pdf",
"created_at": "2024-12-09T08:25:00Z",
"total_chunks": 42
}
],
"count": 1,
"status": "success"
}Delete a document and its associated vectors.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Response:
{
"message": "Document 'example.pdf' deleted successfully",
"status": "success"
}Send a message and get an AI response based on uploaded documents.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Content-Type: application/json
Request Body:
{
"message": "What are the main points discussed in the document?",
"message_history_count": 20
}Response:
{
"status": "success",
"total_input_vectors": 1536,
"total_query_hits": 5,
"total_output_tokens": 150,
"query_hit_doc_uuids": ["550e8400-e29b-41d4-a716-446655440000"],
"model_response": "Based on the uploaded document, the main points discussed are..."
}Retrieve chat history for the current user.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Response:
{
"status": "success",
"messages": [
{
"role": "user",
"content": "What are the main points?",
"model_used": "gpt-4.1",
"tokens_used": 1536,
"response_time_ms": 0,
"ai_prompt": null,
"context_document_uuid": null,
"created_at": "2024-12-09T08:25:00Z"
},
{
"role": "assistant",
"content": "Based on the document...",
"model_used": "gpt-4.1",
"tokens_used": 150,
"response_time_ms": 2500,
"ai_prompt": "System: You are a helpful assistant...",
"context_document_uuid": ["550e8400-e29b-41d4-a716-446655440000"],
"created_at": "2024-12-09T08:25:02Z"
}
]
}Clear all chat history for the current user.
Headers:
Authorization: Bearer YOUR_JWT_TOKEN
Response:
{
"status": "success"
}Check application health status.
Response:
{
"status": "healthy",
"message": "RAG API is running."
}Rag-Model/
βββ π alembic/ # Database migrations
β βββ π versions/ # Migration files
β β βββ c938cf66d0f6_initial_setup.py
β βββ env.py # Alembic environment configuration
βββ π authentication/ # Authentication module
β βββ __init__.py # Module exports
β βββ auth_models.py # Authentication data models
β βββ utils.py # JWT and password utilities
βββ π database/ # Database layer
β βββ models.py # SQLAlchemy models
β βββ postgres_db.py # PostgreSQL connection
β βββ vector_db.py # Qdrant vector database client
βββ π llm/ # Language model integration
β βββ models.py # LLM data models
β βββ openai_client.py # OpenAI API client
βββ π log_config/ # Logging configuration
β βββ __init__.py # Logger factory
β βββ logging_config.py # Logging setup
βββ π route_models/ # API request/response models
β βββ auth_models.py # Authentication models
β βββ chat_models.py # Chat endpoint models
β βββ doc_models.py # Document endpoint models
βββ π routers/ # FastAPI route handlers
β βββ __init__.py # Router exports
β βββ auth_routes.py # Authentication endpoints
β βββ chat_routes.py # Chat endpoints
β βββ doc_routes.py # Document endpoints
βββ π utilities/ # Utility functions
β βββ __init__.py # Utility exports
β βββ utility.py # PDF processing and vectorization
βββ π logs/ # Application logs (auto-created)
βββ π vector_db_storage/ # Qdrant data storage (auto-created)
βββ .env # Environment variables
βββ .gitignore # Git ignore rules
βββ .python-version # Python version specification
βββ alembic.ini # Alembic configuration
βββ main.py # FastAPI application entry point
βββ pyproject.toml # Project dependencies and metadata
βββ settings.py # Application configuration
βββ uv.lock # UV lock file
βββ README.md # This file
- FastAPI application setup with CORS middleware
- Application lifespan management
- Database connection initialization
- Router registration and API documentation
- JWT Token Management: Secure token creation and validation
- Password Security: Argon2 hashing for password storage
- Role-Based Access: User role authorization system
- Dependency Injection: FastAPI dependencies for route protection
- PostgreSQL Models: User, Document, and Message entities
- Vector Database: Qdrant integration for embeddings storage
- Connection Management: Session handling and connection pooling
- Migration Support: Alembic for database schema management
- OpenAI Client: GPT-4.1 and embedding model integration
- RAG Pipeline: Document processing and context retrieval
- Response Generation: Contextual answer generation
- Token Management: Usage tracking and optimization
- Authentication Routes: Login, signup, user management
- Document Routes: Upload, list, delete PDF documents
- Chat Routes: Message sending, history management
- Error Handling: Comprehensive exception management
- PDF Processing: Document parsing and text extraction
- Vectorization: Text-to-embedding conversion
- Chunking: Intelligent text segmentation
- Vector Dimension: 3072 (OpenAI text-embedding-3-large)
- Distance Metric: Cosine similarity
- Payload Schema:
{ "source": "document_filename.pdf", "text": "chunk_content_text", "uuid": "document_uuid" }
- Users β Documents: One-to-many (cascade delete)
- Users β Messages: One-to-many (cascade delete)
- Documents β Vector Embeddings: One-to-many (via UUID)
-
Install Development Dependencies
uv pip install -e .[dev]
-
Code Formatting
black .
- black: Code formatting (25.12.0+)
- icecream: Enhanced debugging (2.1.8+)
Create a New Migration
alembic revision --autogenerate -m "Description of changes"Apply Migrations
alembic upgrade headRollback Migration
alembic downgrade -1The application uses a comprehensive logging system:
- File Logging: Rotating logs in
logs/app.log - Console Logging: Real-time output during development
- Log Levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
- Structured Format: Timestamp, module, level, file:line, message
-
Import Organization
# 1st party imports import os from typing import List # 3rd party imports from fastapi import FastAPI from sqlalchemy import create_engine # local imports from settings import project_settings from database.models import User
-
Function Documentation
def example_function(param1: str, param2: int = 10) -> bool: """ Brief description of the function. Args: param1 (str): Description of param1. param2 (int): Description of param2. Defaults to 10. Returns: bool: Description of return value. """
-
Error Handling
try: # operation result = perform_operation() logger.info("Operation successful") return result except SpecificException as e: logger.error(f"Specific error: {str(e)}") raise HTTPException(status_code=400, detail="Specific error message") except Exception as e: logger.exception(f"Unexpected error: {str(e)}") raise HTTPException(status_code=500, detail="Internal server error")
-
Fork the Repository
-
Create a Feature Branch
git checkout -b feature/your-feature-name
-
Make Changes
- Follow code style guidelines
- Add tests for new functionality
- Update documentation as needed
-
Test Your Changes
# Format code black . # Run tests (when available) pytest # Test API endpoints curl -X GET "http://localhost:8000/health"
-
Submit a Pull Request
- Provide clear description of changes
- Reference any related issues
- Ensure all checks pass
-
Code Quality
- Follow PEP 8 style guidelines
- Use type hints consistently
- Write comprehensive docstrings
- Handle errors gracefully
-
Testing
- Write unit tests for new functions
- Test API endpoints thoroughly
- Verify database operations
- Test error conditions
-
Documentation
- Update README for new features
- Document API changes
- Add inline code comments
- Update configuration examples
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI: For providing powerful language models and embeddings
- Qdrant: For the excellent vector database solution
- FastAPI: For the modern, fast web framework
- LlamaIndex: For document processing capabilities
- SQLAlchemy: For robust database ORM
- Contributors: All developers who have contributed to this project
Built with β€οΈ using FastAPI, OpenAI, and Qdrant