Skip to content

varun-karmikanda/Bajaj-Hackrx

Β 
Β 

Repository files navigation

RAG API Agent

GPU-Accelerated Document Processing System

Python FastAPI Docker AMD PyTorch

A powerful, modular RAG system that processes documents from URLs and provides intelligent question-answering capabilities using GPU-accelerated Llama 3.3-70B-Instruct via vLLM.


Features

Core Capabilities

  • Universal URL Processing - PDFs, HTML, JSON APIs, text files
  • Multilingual Support - Questions in multiple languages
  • GPU Acceleration - AMD GPU support for embeddings and LLM
  • Modular Architecture - Clean, maintainable code structure

Advanced Features

  • vLLM Integration - High-performance Llama 3.3-70B-Instruct
  • Docker Deployment - Complete containerized setup
  • Vector Store Caching - Efficient FAISS-based storage
  • Parallel Processing - Concurrent question handling

Tech Stack


Python 3.10+
Core Language

vLLM
LLM Inference

FastAPI
Web Framework

Docker
Containerization

PyTorch
ML Framework

HuggingFace
Model Hub

LangChain
RAG Framework

FAISS
Vector Search

ROCm
GPU Compute

Docling
Document Processing

Architecture Overview

RAG System Architecture

🌐 Live Demo & API Endpoints

πŸš€ Working URL

Live API Endpoint: https://4145182fdfba.ngrok-free.app/api/v1/hackrx/run

Quick Test

# Test the live endpoint
curl -X POST "https://4145182fdfba.ngrok-free.app/api/v1/hackrx/run" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is this document about?",
    "url": "https://example.com/document.pdf"
  }'

Quick Start

Option 1: Docker Setup (Recommended)

Prerequisites

Hardware

  • AMD MI300X GPU (192GB HBM3)
  • 32GB+ RAM
  • 100GB+ storage

Software

  • Docker & Docker Compose
  • ROCm Docker runtime
  • HuggingFace account & token

One-Command Setup

git clone <repository-url>
cd Bajaj-Hackrx
./setup.sh

Manual Setup

# 1. Environment setup
cp env.example .env
nano .env  # Add your HuggingFace token

# 2. Start services
docker-compose up -d

# 3. Monitor startup
docker-compose logs -f vllm-server

Option 2: Local Development

Click to expand local setup instructions
# Clone and setup
git clone <repository-url>
cd Bajaj-Hackrx
python -m venv venv

# Activate environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure and run
cp env.example .env
python main.py

API Reference

Authentication

All API requests require authentication using Bearer token:

Authorization: Bearer 82e98b40bb2546d8eea6db9bed3c61ef6cafdf3b2a22c0d16edcf3f795e679cf

Endpoints

Endpoint Method Description
/ GET Health check and system status
/api/v1/hackrx/run POST Process document and answer questions
/api/v1/validate-file GET Validate URL before processing
/api/v1/documents GET List processed documents
/api/v1/llama/status GET Check vLLM server status

Example Request

curl -X POST "http://localhost:8000/api/v1/hackrx/run" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": "https://example.com/document.pdf",
    "questions": [
      "What is the main topic?",
      "What are the key findings?"
    ]
  }'

Response Format

{
  "answers": [
    "The document discusses advanced AI techniques for document processing...",
    "Key findings include improved accuracy and reduced processing time..."
  ]
}

Configuration

Environment Variables

Variable Default Description
HF_TOKEN required HuggingFace token for model access
API_KEY Generated key Authentication key for RAG API
LLAMA_API_URL localhost:8001 vLLM server endpoint
USE_EXTERNAL_LLAMA true Enable vLLM integration
MAX_CONCURRENT_REQUESTS 10 Parallel request limit

Service Ports

Service Port Purpose
RAG API 8000 Main application interface
vLLM Server 8001 Llama model inference

Performance Metrics

2-10x
Faster inference vs standard implementations
3-8 sec
PDF processing + questions (with GPU)
192GB
HBM3 memory on MI300X for optimal performance
2-3 min
Initial model loading time

System Requirements

Production Deployment

Minimum Requirements

  • 16GB RAM
  • AMD MI300X GPU (192GB HBM3)
  • Docker with ROCm runtime
  • 100GB storage

Recommended Setup

  • 64GB+ RAM
  • AMD MI300X (192GB HBM3)
  • NVMe SSD storage
  • High-speed internet

GPU Compatibility

GPU Model Memory Status
AMD MI210 64GB HBM2e ⚠️ Limited
AMD MI250X 128GB HBM2e βœ… Supported
AMD MI300A 128GB HBM3 βœ… Good
AMD MI300X 192GB HBM3 βœ… Optimal

Troubleshooting

Common Issues & Solutions

vLLM Server Issues

  • Won't start: Check AMD GPU memory and HuggingFace token
  • Slow loading: Monitor with docker-compose logs -f vllm-server
  • Out of memory: Reduce tensor-parallel-size in docker-compose.yml
  • ROCm issues: Verify ROCm installation and GPU visibility

Connection Issues

  • Connection refused: Wait for health checks to pass
  • API errors: Verify authentication token
  • Network issues: Check service status with docker-compose ps

Useful Commands

# Service management
docker-compose ps                    # Check status
docker-compose logs -f              # View logs
docker-compose restart rag-api      # Restart service
docker-compose down && docker-compose up -d  # Clean restart

# System monitoring
rocm-smi                            # AMD GPU usage
docker info | grep rocm            # ROCm runtime check

Development

Project Structure

src/
β”œβ”€β”€ core/          # Configuration and models
β”œβ”€β”€ api/           # FastAPI application
β”œβ”€β”€ rag/           # RAG system components  
β”œβ”€β”€ document/      # Document processing
└── utils/         # Utility functions

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-feature)
  3. Make your changes following the modular structure
  4. Test your changes thoroughly
  5. Submit a pull request

License

This project is licensed under the MIT License.


Made with ❀️ for intelligent document processing

About

A RAG model that takes document input and answers query related pertaining to the document

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.7%
  • Shell 2.8%
  • Dockerfile 0.5%