A powerful Retrieval-Augmented Generation (RAG) system that allows users to upload various data formats and interact with them through natural language queries. Built with modern technologies and designed for scalability and ease of use.
- 📄 Multi-Format Support: Upload and process CSV, Excel, PDF, and text files
- 🧠 Intelligent Retrieval: Uses sentence transformers for semantic search
- 💬 Natural Language Chat: Query your data using conversational AI powered by Google Gemini
- 📊 Vector Database: ChromaDB for efficient similarity search and retrieval
- 🔄 Real-time Processing: Instant file processing and indexing
- 📈 Chat History: Persistent conversation history with context awareness
- 🎨 Modern UI: Clean, responsive interface built with React and Tailwind CSS
- ⚡ Fast API: High-performance backend with FastAPI and async processing
┌────────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ React Frontend │────│ FastAPI │────│ ChromaDB │
│ (Vite + Tailwind)│ │ Backend │ │ Vector Store │
└────────────────────┘ └─────────────────┘ └─────────────────┘
│
┌─────────────────┐
│ Google Gemini │
│ AI Model │
└─────────────────┘
- Frontend: React 18 with Vite, Tailwind CSS, and Lucide React icons
- Backend: FastAPI with async support, CORS middleware, and structured routing
- AI Model: Google Gemini 2.5 Flash for natural language processing
- Embeddings: Sentence Transformers for semantic understanding
- Vector Database: ChromaDB for efficient similarity search
- File Processing: Support for multiple formats with automatic text extraction
- FastAPI - Modern, fast web framework for building APIs
- Google Generative AI - Gemini 2.5 Flash model integration
- ChromaDB - Vector database for embeddings and similarity search
- Sentence Transformers - State-of-the-art sentence embeddings
- Pandas - Data manipulation and analysis
- PDFPlumber - PDF text extraction
- OpenPyXL - Excel file processing
- React 18 - Modern React with hooks and functional components
- Vite - Fast build tool and development server
- Tailwind CSS - Utility-first CSS framework
- Lucide React - Beautiful, customizable icons
generic-data-rag-agent/
├── backend/
│ ├── app/
│ │ ├── core/
│ │ │ └── config.py # Configuration settings
│ │ ├── routers/
│ │ │ ├── chat.py # Chat endpoints
│ │ │ ├── files.py # File management endpoints
│ │ │ └── history.py # History endpoints
│ │ ├── services/
│ │ │ ├── indexer.py # Document indexing
│ │ │ ├── ingestion.py # File processing
│ │ │ ├── retriever.py # Vector search
│ │ │ └── history.py # Chat history management
│ │ ├── main.py # FastAPI application
│ │ ├── models.py # Pydantic models
│ │ └── storage.py # File storage utilities
│ ├── chroma_db/ # Vector database storage
│ ├── uploads/ # Uploaded files storage
│ ├── requirements.txt # Python dependencies
│ └── start_server.py # Server startup script
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Main React component
│ │ ├── main.jsx # React entry point
│ │ └── index.css # Tailwind styles
│ ├── index.html # HTML template
│ ├── package.json # Node.js dependencies
│ ├── tailwind.config.js # Tailwind configuration
│ └── vite.config.js # Vite configuration
├── start-backend.bat # Windows backend starter
├── start-frontend.bat # Windows frontend starter
└── README.md # This file
- Python 3.8+
- Node.js 16+
- Google Gemini API Key (Get it here)
git clone https://github.com/yashdew3/generic-data-rag-agent.git
cd generic-data-rag-agent# Navigate to backend directory
cd backend
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .envCreate a .env file in the backend directory:
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.5-flash
FRONTEND_ORIGIN=http://localhost:5173# Navigate to frontend directory (new terminal)
cd frontend
# Install dependencies
npm install# Start backend (from root directory)
start-backend.bat
# Start frontend (from root directory)
start-frontend.bat# Terminal 1 - Backend
cd backend
python start_server.py
# Terminal 2 - Frontend
cd frontend
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Click the "Choose Files" button
- Select CSV, Excel, PDF, or text files
- Files are automatically processed and indexed
- Type natural language questions about your uploaded data
- Examples:
- "What are the main trends in this dataset?"
- "Summarize the key findings from the uploaded report"
- "Show me insights about sales performance"
POST /files/upload- Upload and process filesGET /files/list- List uploaded filesDELETE /files/{file_id}- Delete a file
POST /chat/message- Send a chat messageGET /chat/history/{session_id}- Get chat history
GET /history/sessions- List all chat sessionsDELETE /history/sessions/{session_id}- Delete a session
cd backend
python test_system.pycd frontend
npm run lint # ESLint checking
npm run build # Production build
npm run preview # Preview production build- CORS Protection: Configurable origin restrictions
- File Validation: Secure file type checking
- API Key Management: Environment-based configuration
- Input Sanitization: Secure data processing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page (if you have one) or open a new issue to discuss changes. Pull requests are also appreciated.
This project is licensed under the MIT License © Yash Dewangan
Feel free to connect or suggest improvements!
- Built by Yash Dewangan
- 🐙Github: YashDewangan
- 📧Email: yashdew06@gmail.com
- 🔗Linkedin: YashDewangan
Built with ❤️ for intelligent data interaction
This project demonstrates modern RAG architecture with production-ready code quality and comprehensive documentation.
