Agentic RAG System for PDF documents with multimodal capabilities

📌 Overview

Agentic RAG (Agentic Retrieval-Augmented Generation) is an advanced AI architecture that extends traditional RAG by integrating autonomous AI agents into the pipeline. This project implements an Agentic RAG system specifically designed for PDF documents with multimodal capabilities.

Core Problem Addressed: Traditional RAG systems are limited to text-only inputs and lack intelligent reasoning capabilities. This system overcomes those limitations by:

Multimodal support: Processes not only text but also images, tables, and other elements within PDF documents. It can also handle queries that combine both textual and visual inputs.
Agentic approach with LangGraph: Enables intelligent document retrieval and reasoning.
Comprehensive PDF processing: Includes image extraction and descriptive analysis.
User-friendly web interface: Provides seamless interaction for end users.

Target Users: Researchers, data scientists, and developers who require intelligent document analysis and question answering across PDF collections containing both textual and visual content.

Key Differentiators:

Graph-based agentic reasoning with LangGraph for sophisticated query processing.
Multimodal capabilities that combine text queries with image analysis.
Advanced PDF processing powered by Docling for deep document understanding.
Hybrid search functionality using the Milvus vector database integrated with BM25 for enhanced retrieval accuracy.

🧑‍💻 Tech stack

LangChain, LangGraph, Docling, Milvus, Attu, Docker, OpenAI, HuggingFace, PIL, OpenCV, PyTorch, Flask

Core Framework: LangChain for LLM orchestration, LangGraph for agentic workflows and stateful reasoning.
Web Framework: Flask as the lightweight backend web framework serving the chat interface.
Vector Database: Milvus with BM25 hybrid search
PDF Processing: Docling for advanced document conversion and chunking
ML Models:
- HuggingFace Transformers (BAAI/bge-m3 embeddings)
- PyTorch for GPU acceleration
- OpenAI GPT models for vision and language processing
Image Processing: PIL, OpenCV for image handling
Infrastructure: Docker, Attu for Milvus deployment

⭐ Key features

Agentic RAG Architecture (src/agent.py)
- Graph-based reasoning using LangGraph with state management
- Intelligent document grading and query rewriting
- Multi-step reasoning workflow with conversation history tracking
- Support for multiple vector store retrievers as tools
Advanced PDF Processing (src/pdf_loader.py)
- Comprehensive document conversion using Docling
- Intelligent chunking with HybridChunker
- Image extraction and automatic description generation
- Metadata preservation for enhanced retrieval
Multimodal Vision Capabilities (src/vision_model.py)
- OpenAI integration for image analysis
- Multiple analysis types (comprehensive, technical, contextual)
- Image preprocessing and encoding for optimal analysis
- Structured output for embedding and retrieval
Hybrid Vector Search (src/vector_store.py)
- Milvus vector database with BM25 built-in functions
- Namespace-based document organization
- Async operation support with proper event loop management
- Scalable database and collection management
Web Interface (templates/index.html, app.py)
- Modern responsive UI with real-time chat interface by Flask and HTML
- Drag-and-drop file upload for PDFs and images
- Live agent workflow visualization
- Support for both text and multimodal queries

⚙️ Installation & Usage

Prerequisites

Python 3.10
CUDA-compatible GPU (optional, for accelerated inference)
Docker for Milvus deployment
OpenAI API key for vision capabilities

Installation

Clone the repository

git clone https://github.com/YuITC/Agentic-RAG.git
cd Agentic-RAG

Set up Python environment

# Create virtual environment (recommended)
python -m venv venv

# Windows
venv\Scripts\activate

# macOS/Linux
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Install and Start Milvus

# Option 1: Use provided Windows batch script
standalone_embed.bat start

# Option 2: Download and use official script (Linux/macOS)
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
bash standalone_embed.sh start

# Start Milvus
docker run -d --name milvus_standalone -p 19530:19530 -p 9091:9091 milvusdb/milvus:latest

Optional: Set up Attu (Milvus GUI)

# Get your IPv4 address
ipconfig

# Start Attu interface
docker run -p 8000:3000 -e MILVUS_URL=localhost:19530 zilliz/attu:v2.6

# Access at http://localhost:19530

Usage

Start the application
```
python app.py
```
Access the web interface
- Open your browser and navigate to http://localhost:5000
- The interface provides:
  - API key configuration
  - Model selection (from OpenAI)
  - PDF document upload and processing
  - Real-time chat with multimodal support
  - Agent workflow visualization
Using the system
- Upload PDFs: Drag and drop PDF files to process and index them
- Text queries: Ask questions about your documents
- Image queries: Upload images along with text for multimodal analysis
- Monitor progress: Watch the agent workflow steps in real-time

API Usage Example

from src.agent        import AgenticRAG
from src.vector_store import MilvusVectorStore
from src.pdf_loader   import process_dir, get_embeddings_model

# Initialize components
embeddings   = get_embeddings_model()
vector_store = MilvusVectorStore(namespace="my_docs", embeddings_model=embeddings)

# Process documents
docs = process_dir("path/to/pdfs", namespace="my_docs")
vector_store.add_documents(docs)

# Create agent
agent = AgenticRAG(
    vector_stores = [{"store": vector_store, "name": "document_retriever"}],
    llm           = "gpt-4",
    api_key       = "your_api_key"
)

# Query the system
response = agent.query("What are the main findings in the research papers?")

📂 Project structure

Agentic-RAG/
├── src/
│   ├── utils/
│   │   └── custom_logger.py      # Logging configuration and utilities
│   ├── agent.py                  # Agentic RAG implementation with LangGraph
│   ├── pdf_loader.py             # PDF processing and document chunking
│   ├── vector_store.py           # Milvus vector database integration
│   └── vision_model.py           # OpenAI vision model for image analysis
├── templates/
│   └── index.html                # Web interface template with chat UI
├── app.py                        # Flask web application entry point
├── requirements.txt              # Python dependencies specification
├── standalone_embed.bat          # Windows script for Milvus deployment
├── .gitignore                    # Git ignore patterns
└── README.md                     # Project documentation

📫 Contact

If you find this project useful, consider ⭐️ starring the repository or contributing to further improvements!

For any questions, feature requests, or collaboration opportunities, feel free to reach out: tainguyenphu2502@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic RAG System for PDF documents with multimodal capabilities

Table of Contents

📌 Overview

🧑‍💻 Tech stack

⭐ Key features

⚙️ Installation & Usage

Prerequisites

Installation

Usage

API Usage Example

📂 Project structure

📫 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
src		src
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
standalone_embed.bat		standalone_embed.bat

YuITC/Agentic-RAG

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG System for PDF documents with multimodal capabilities

Table of Contents

📌 Overview

🧑‍💻 Tech stack

⭐ Key features

⚙️ Installation & Usage

Prerequisites

Installation

Usage

API Usage Example

📂 Project structure

📫 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages