RAG API Agent

GPU-Accelerated Document Processing System

A powerful, modular RAG system that processes documents from URLs and provides intelligent question-answering capabilities using GPU-accelerated Llama 3.3-70B-Instruct via vLLM.

Features

Core Capabilities

Universal URL Processing - PDFs, HTML, JSON APIs, text files
Multilingual Support - Questions in multiple languages
GPU Acceleration - AMD GPU support for embeddings and LLM
Modular Architecture - Clean, maintainable code structure

Advanced Features

vLLM Integration - High-performance Llama 3.3-70B-Instruct
Docker Deployment - Complete containerized setup
Vector Store Caching - Efficient FAISS-based storage
Parallel Processing - Concurrent question handling

Tech Stack

Python 3.10+ _{Core Language}	vLLM _{LLM Inference}	FastAPI _{Web Framework}	Docker _{Containerization}	PyTorch _{ML Framework}
HuggingFace _{Model Hub}	LangChain _{RAG Framework}	FAISS _{Vector Search}	ROCm _{GPU Compute}	Docling _{Document Processing}

Architecture Overview

🌐 Live Demo & API Endpoints

🚀 Working URL

Live API Endpoint: https://4145182fdfba.ngrok-free.app/api/v1/hackrx/run

Quick Test

# Test the live endpoint
curl -X POST "https://4145182fdfba.ngrok-free.app/api/v1/hackrx/run" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is this document about?",
    "url": "https://example.com/document.pdf"
  }'

Quick Start

Option 1: Docker Setup (Recommended)

Prerequisites

Hardware

AMD MI300X GPU (192GB HBM3)
32GB+ RAM
100GB+ storage

Software

Docker & Docker Compose
ROCm Docker runtime
HuggingFace account & token

One-Command Setup

git clone <repository-url>
cd Bajaj-Hackrx
./setup.sh

Manual Setup

# 1. Environment setup
cp env.example .env
nano .env  # Add your HuggingFace token

# 2. Start services
docker-compose up -d

# 3. Monitor startup
docker-compose logs -f vllm-server

Option 2: Local Development

Click to expand local setup instructions

# Clone and setup
git clone <repository-url>
cd Bajaj-Hackrx
python -m venv venv

# Activate environment
# Windows:
venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure and run
cp env.example .env
python main.py

API Reference

Authentication

All API requests require authentication using Bearer token:

Authorization: Bearer 82e98b40bb2546d8eea6db9bed3c61ef6cafdf3b2a22c0d16edcf3f795e679cf

Endpoints

Endpoint	Method	Description
`/`	`GET`	Health check and system status
`/api/v1/hackrx/run`	`POST`	Process document and answer questions
`/api/v1/validate-file`	`GET`	Validate URL before processing
`/api/v1/documents`	`GET`	List processed documents
`/api/v1/llama/status`	`GET`	Check vLLM server status

Example Request

curl -X POST "http://localhost:8000/api/v1/hackrx/run" \
  -H "Authorization: Bearer <your-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": "https://example.com/document.pdf",
    "questions": [
      "What is the main topic?",
      "What are the key findings?"
    ]
  }'

Response Format

{
  "answers": [
    "The document discusses advanced AI techniques for document processing...",
    "Key findings include improved accuracy and reduced processing time..."
  ]
}

Configuration

Environment Variables

Variable	Default	Description
`HF_TOKEN`	required	HuggingFace token for model access
`API_KEY`	Generated key	Authentication key for RAG API
`LLAMA_API_URL`	localhost:8001	vLLM server endpoint
`USE_EXTERNAL_LLAMA`	true	Enable vLLM integration
`MAX_CONCURRENT_REQUESTS`	10	Parallel request limit

Service Ports

Service	Port	Purpose
RAG API	8000	Main application interface
vLLM Server	8001	Llama model inference

Performance Metrics

2-10x
_{Faster inference vs standard implementations}

3-8 sec
_{PDF processing + questions (with GPU)}

192GB
_{HBM3 memory on MI300X for optimal performance}

2-3 min
_{Initial model loading time}

System Requirements

Production Deployment

Minimum Requirements

16GB RAM
AMD MI300X GPU (192GB HBM3)
Docker with ROCm runtime
100GB storage

Recommended Setup

64GB+ RAM
AMD MI300X (192GB HBM3)
NVMe SSD storage
High-speed internet

GPU Compatibility

GPU Model	Memory	Status
AMD MI210	64GB HBM2e	⚠️ Limited
AMD MI250X	128GB HBM2e	✅ Supported
AMD MI300A	128GB HBM3	✅ Good
AMD MI300X	192GB HBM3	✅ Optimal

Troubleshooting

Common Issues & Solutions

vLLM Server Issues

Won't start: Check AMD GPU memory and HuggingFace token
Slow loading: Monitor with docker-compose logs -f vllm-server
Out of memory: Reduce tensor-parallel-size in docker-compose.yml
ROCm issues: Verify ROCm installation and GPU visibility

Connection Issues

Connection refused: Wait for health checks to pass
API errors: Verify authentication token
Network issues: Check service status with docker-compose ps

Useful Commands

# Service management
docker-compose ps                    # Check status
docker-compose logs -f              # View logs
docker-compose restart rag-api      # Restart service
docker-compose down && docker-compose up -d  # Clean restart

# System monitoring
rocm-smi                            # AMD GPU usage
docker info | grep rocm            # ROCm runtime check

Development

Project Structure

src/
├── core/          # Configuration and models
├── api/           # FastAPI application
├── rag/           # RAG system components  
├── document/      # Document processing
└── utils/         # Utility functions

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Make your changes following the modular structure
Test your changes thoroughly
Submit a pull request

License

This project is licensed under the MIT License.

Made with ❤️ for intelligent document processing

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
config		config
policy_docs		policy_docs
scripts		scripts
src		src
tests		tests
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
ENVIRONMENT_SETUP.md		ENVIRONMENT_SETUP.md
OCR_SETUP_SUMMARY.md		OCR_SETUP_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
architecture.jpeg		architecture.jpeg
docker-compose.yml		docker-compose.yml
env.example		env.example
example.png		example.png
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh

varun-karmikanda/Bajaj-Hackrx

Folders and files

Latest commit

History

Repository files navigation