Agentic RAG (Agentic Retrieval-Augmented Generation) is an advanced AI architecture that extends traditional RAG by integrating autonomous AI agents into the pipeline. This project implements an Agentic RAG system specifically designed for PDF documents with multimodal capabilities.
Core Problem Addressed: Traditional RAG systems are limited to text-only inputs and lack intelligent reasoning capabilities. This system overcomes those limitations by:
- Multimodal support: Processes not only text but also images, tables, and other elements within PDF documents. It can also handle queries that combine both textual and visual inputs.
- Agentic approach with
LangGraph: Enables intelligent document retrieval and reasoning. - Comprehensive PDF processing: Includes image extraction and descriptive analysis.
- User-friendly web interface: Provides seamless interaction for end users.
Target Users: Researchers, data scientists, and developers who require intelligent document analysis and question answering across PDF collections containing both textual and visual content.
Key Differentiators:
- Graph-based agentic reasoning with
LangGraphfor sophisticated query processing. - Multimodal capabilities that combine text queries with image analysis.
- Advanced PDF processing powered by
Doclingfor deep document understanding. - Hybrid search functionality using the
Milvusvector database integrated with BM25 for enhanced retrieval accuracy.
LangChain, LangGraph, Docling, Milvus, Attu, Docker, OpenAI, HuggingFace, PIL, OpenCV, PyTorch, Flask
- Core Framework:
LangChainfor LLM orchestration,LangGraphfor agentic workflows and stateful reasoning. - Web Framework:
Flaskas the lightweight backend web framework serving the chat interface. - Vector Database:
Milvuswith BM25 hybrid search - PDF Processing:
Doclingfor advanced document conversion and chunking - ML Models:
- HuggingFace Transformers (
BAAI/bge-m3embeddings) PyTorchfor GPU acceleration- OpenAI GPT models for vision and language processing
- HuggingFace Transformers (
- Image Processing:
PIL,OpenCVfor image handling - Infrastructure:
Docker,Attufor Milvus deployment
-
Agentic RAG Architecture (
src/agent.py)- Graph-based reasoning using
LangGraphwith state management - Intelligent document grading and query rewriting
- Multi-step reasoning workflow with conversation history tracking
- Support for multiple vector store retrievers as tools
- Graph-based reasoning using
-
Advanced PDF Processing (
src/pdf_loader.py)- Comprehensive document conversion using
Docling - Intelligent chunking with
HybridChunker - Image extraction and automatic description generation
- Metadata preservation for enhanced retrieval
- Comprehensive document conversion using
-
Multimodal Vision Capabilities (
src/vision_model.py)OpenAIintegration for image analysis- Multiple analysis types (comprehensive, technical, contextual)
- Image preprocessing and encoding for optimal analysis
- Structured output for embedding and retrieval
-
Hybrid Vector Search (
src/vector_store.py)Milvusvector database with BM25 built-in functions- Namespace-based document organization
- Async operation support with proper event loop management
- Scalable database and collection management
-
Web Interface (
templates/index.html,app.py)- Modern responsive UI with real-time chat interface by
FlaskandHTML - Drag-and-drop file upload for PDFs and images
- Live agent workflow visualization
- Support for both text and multimodal queries
- Modern responsive UI with real-time chat interface by
- Python 3.10
- CUDA-compatible GPU (optional, for accelerated inference)
DockerforMilvusdeployment- OpenAI API key for vision capabilities
-
Clone the repository
git clone https://github.com/YuITC/Agentic-RAG.git cd Agentic-RAG -
Set up Python environment
# Create virtual environment (recommended) python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate # Install dependencies pip install -r requirements.txt
-
Install and Start Milvus
# Option 1: Use provided Windows batch script standalone_embed.bat start # Option 2: Download and use official script (Linux/macOS) curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh bash standalone_embed.sh start # Start Milvus docker run -d --name milvus_standalone -p 19530:19530 -p 9091:9091 milvusdb/milvus:latest
-
Optional: Set up Attu (Milvus GUI)
# Get your IPv4 address ipconfig # Start Attu interface docker run -p 8000:3000 -e MILVUS_URL=localhost:19530 zilliz/attu:v2.6 # Access at http://localhost:19530
-
Start the application
python app.py
-
Access the web interface
- Open your browser and navigate to
http://localhost:5000 - The interface provides:
- API key configuration
- Model selection (from OpenAI)
- PDF document upload and processing
- Real-time chat with multimodal support
- Agent workflow visualization
- Open your browser and navigate to
-
Using the system
- Upload PDFs: Drag and drop PDF files to process and index them
- Text queries: Ask questions about your documents
- Image queries: Upload images along with text for multimodal analysis
- Monitor progress: Watch the agent workflow steps in real-time
from src.agent import AgenticRAG
from src.vector_store import MilvusVectorStore
from src.pdf_loader import process_dir, get_embeddings_model
# Initialize components
embeddings = get_embeddings_model()
vector_store = MilvusVectorStore(namespace="my_docs", embeddings_model=embeddings)
# Process documents
docs = process_dir("path/to/pdfs", namespace="my_docs")
vector_store.add_documents(docs)
# Create agent
agent = AgenticRAG(
vector_stores = [{"store": vector_store, "name": "document_retriever"}],
llm = "gpt-4",
api_key = "your_api_key"
)
# Query the system
response = agent.query("What are the main findings in the research papers?")Agentic-RAG/
├── src/
│ ├── utils/
│ │ └── custom_logger.py # Logging configuration and utilities
│ ├── agent.py # Agentic RAG implementation with LangGraph
│ ├── pdf_loader.py # PDF processing and document chunking
│ ├── vector_store.py # Milvus vector database integration
│ └── vision_model.py # OpenAI vision model for image analysis
├── templates/
│ └── index.html # Web interface template with chat UI
├── app.py # Flask web application entry point
├── requirements.txt # Python dependencies specification
├── standalone_embed.bat # Windows script for Milvus deployment
├── .gitignore # Git ignore patterns
└── README.md # Project documentationIf you find this project useful, consider ⭐️ starring the repository or contributing to further improvements!
For any questions, feature requests, or collaboration opportunities, feel free to reach out: tainguyenphu2502@gmail.com


