Skip to content

YuITC/Agentic-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic RAG System for PDF documents with multimodal capabilities

Table of Contents

📌 Overview

Demo

Agentic RAG (Agentic Retrieval-Augmented Generation) is an advanced AI architecture that extends traditional RAG by integrating autonomous AI agents into the pipeline. This project implements an Agentic RAG system specifically designed for PDF documents with multimodal capabilities.

Core Problem Addressed: Traditional RAG systems are limited to text-only inputs and lack intelligent reasoning capabilities. This system overcomes those limitations by:

  • Multimodal support: Processes not only text but also images, tables, and other elements within PDF documents. It can also handle queries that combine both textual and visual inputs.
  • Agentic approach with LangGraph: Enables intelligent document retrieval and reasoning.
  • Comprehensive PDF processing: Includes image extraction and descriptive analysis.
  • User-friendly web interface: Provides seamless interaction for end users.

Target Users: Researchers, data scientists, and developers who require intelligent document analysis and question answering across PDF collections containing both textual and visual content.

Key Differentiators:

  • Graph-based agentic reasoning with LangGraph for sophisticated query processing.
  • Multimodal capabilities that combine text queries with image analysis.
  • Advanced PDF processing powered by Docling for deep document understanding.
  • Hybrid search functionality using the Milvus vector database integrated with BM25 for enhanced retrieval accuracy.

🧑‍💻 Tech stack

LangChain, LangGraph, Docling, Milvus, Attu, Docker, OpenAI, HuggingFace, PIL, OpenCV, PyTorch, Flask

  • Core Framework: LangChain for LLM orchestration, LangGraph for agentic workflows and stateful reasoning.
  • Web Framework: Flask as the lightweight backend web framework serving the chat interface.
  • Vector Database: Milvus with BM25 hybrid search
  • PDF Processing: Docling for advanced document conversion and chunking
  • ML Models:
    • HuggingFace Transformers (BAAI/bge-m3 embeddings)
    • PyTorch for GPU acceleration
    • OpenAI GPT models for vision and language processing
  • Image Processing: PIL, OpenCV for image handling
  • Infrastructure: Docker, Attu for Milvus deployment

⭐ Key features

  1. Agentic RAG Architecture (src/agent.py)

    • Graph-based reasoning using LangGraph with state management
    • Intelligent document grading and query rewriting
    • Multi-step reasoning workflow with conversation history tracking
    • Support for multiple vector store retrievers as tools

    Agentic RAG System Overview

  2. Advanced PDF Processing (src/pdf_loader.py)

    • Comprehensive document conversion using Docling
    • Intelligent chunking with HybridChunker
    • Image extraction and automatic description generation
    • Metadata preservation for enhanced retrieval

    Data Processing Pipeline

  3. Multimodal Vision Capabilities (src/vision_model.py)

    • OpenAI integration for image analysis
    • Multiple analysis types (comprehensive, technical, contextual)
    • Image preprocessing and encoding for optimal analysis
    • Structured output for embedding and retrieval
  4. Hybrid Vector Search (src/vector_store.py)

    • Milvus vector database with BM25 built-in functions
    • Namespace-based document organization
    • Async operation support with proper event loop management
    • Scalable database and collection management
  5. Web Interface (templates/index.html, app.py)

    • Modern responsive UI with real-time chat interface by Flask and HTML
    • Drag-and-drop file upload for PDFs and images
    • Live agent workflow visualization
    • Support for both text and multimodal queries

⚙️ Installation & Usage

Prerequisites

  • Python 3.10
  • CUDA-compatible GPU (optional, for accelerated inference)
  • Docker for Milvus deployment
  • OpenAI API key for vision capabilities

Installation

  1. Clone the repository

    git clone https://github.com/YuITC/Agentic-RAG.git
    cd Agentic-RAG
  2. Set up Python environment

    # Create virtual environment (recommended)
    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # macOS/Linux
    source venv/bin/activate
    
    # Install dependencies
    pip install -r requirements.txt
  3. Install and Start Milvus

    # Option 1: Use provided Windows batch script
    standalone_embed.bat start
    
    # Option 2: Download and use official script (Linux/macOS)
    curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
    bash standalone_embed.sh start
    
    # Start Milvus
    docker run -d --name milvus_standalone -p 19530:19530 -p 9091:9091 milvusdb/milvus:latest
  4. Optional: Set up Attu (Milvus GUI)

    # Get your IPv4 address
    ipconfig
    
    # Start Attu interface
    docker run -p 8000:3000 -e MILVUS_URL=localhost:19530 zilliz/attu:v2.6
    
    # Access at http://localhost:19530

Usage

  1. Start the application

    python app.py
  2. Access the web interface

    • Open your browser and navigate to http://localhost:5000
    • The interface provides:
      • API key configuration
      • Model selection (from OpenAI)
      • PDF document upload and processing
      • Real-time chat with multimodal support
      • Agent workflow visualization
  3. Using the system

    • Upload PDFs: Drag and drop PDF files to process and index them
    • Text queries: Ask questions about your documents
    • Image queries: Upload images along with text for multimodal analysis
    • Monitor progress: Watch the agent workflow steps in real-time

API Usage Example

from src.agent        import AgenticRAG
from src.vector_store import MilvusVectorStore
from src.pdf_loader   import process_dir, get_embeddings_model

# Initialize components
embeddings   = get_embeddings_model()
vector_store = MilvusVectorStore(namespace="my_docs", embeddings_model=embeddings)

# Process documents
docs = process_dir("path/to/pdfs", namespace="my_docs")
vector_store.add_documents(docs)

# Create agent
agent = AgenticRAG(
    vector_stores = [{"store": vector_store, "name": "document_retriever"}],
    llm           = "gpt-4",
    api_key       = "your_api_key"
)

# Query the system
response = agent.query("What are the main findings in the research papers?")

📂 Project structure

Agentic-RAG/
├── src/
│   ├── utils/
│   │   └── custom_logger.py      # Logging configuration and utilities
│   ├── agent.py                  # Agentic RAG implementation with LangGraph
│   ├── pdf_loader.py             # PDF processing and document chunking
│   ├── vector_store.py           # Milvus vector database integration
│   └── vision_model.py           # OpenAI vision model for image analysis
├── templates/
│   └── index.html                # Web interface template with chat UI
├── app.py                        # Flask web application entry point
├── requirements.txt              # Python dependencies specification
├── standalone_embed.bat          # Windows script for Milvus deployment
├── .gitignore                    # Git ignore patterns
└── README.md                     # Project documentation

📫 Contact

If you find this project useful, consider ⭐️ starring the repository or contributing to further improvements!

For any questions, feature requests, or collaboration opportunities, feel free to reach out: tainguyenphu2502@gmail.com

About

Agentic RAG implementation which support PDF documents with multimodal capabilities

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published