Skip to content

ParijatSoftware/ChatWithDocs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Document Chat & Summary Application

A RAG-based application that allows you to upload documents, chat with them using natural language, and generate intelligent summaries.

πŸš€ Quick Start

Prerequisites

  • Python 3.9+ (recommended: Python 3.11)

Option 1: One-Command Setup (Recommended)

  1. Clone the repository:

    git clone <your-repo-url>
    cd ChatWithDocs
  2. Make the run script executable:

    chmod +x run.sh
  3. Run the application:

    ./run.sh

That's it! The script will:

  • βœ… Create a virtual environment
  • βœ… Install all dependencies
  • βœ… Start the backend API server
  • βœ… Launch the Streamlit frontend
  • βœ… Open your browser automatically

Option 2: Manual Setup

If you prefer to set up manually or encounter issues with the script:

Step 1: Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Step 2: Install Dependencies

# Install backend dependencies
cd backend
pip install -r requirements.txt

# Install frontend dependencies
cd ../frontend
pip install -r requirements.txt

Step 3: Start Services

Terminal 1 - Backend:

cd backend
python main.py

Terminal 2 - Frontend:

cd frontend
streamlit run app.py

πŸ”§ Configuration

LLM Provider Setup

Choose one of the following options:

Option A: OpenAI (Recommended for best results)

  1. Get an API key from OpenAI
  2. In the Streamlit interface:
    • Select "openai" as provider
    • Choose your model (gpt-4, gpt-3.5-turbo)
    • Enter your API key
    • Click "Configure LLM"

Option B: Local LLM with Ollama (Free, runs offline)

  1. Install Ollama:

    # macOS
    brew install ollama
    
    # Linux
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Windows: Download from https://ollama.ai
  2. Start Ollama and pull a model:

    # Start Ollama service
    ollama serve
    
    # In another terminal, pull a model
    ollama pull llama3     # or mistral, phi3, codellama
  3. Configure in Streamlit:

    • Select "ollama" as provider
    • Choose your model (llama3, mistral, etc.)
    • Click "Configure LLM"

πŸ“– How to Use

1. Upload Documents

  • Go to the Upload tab
  • Drag & drop or select files (PDF, DOCX, TXT)
  • Click "Upload and Process"
  • Wait for processing to complete

2. Chat with Documents

  • Go to the Chat tab
  • Select a document from the sidebar
  • Ask questions like:
    • "What is this document about?"
    • "What are the main conclusions?"
    • "Explain the methodology used"
    • "Find information about X"

3. Generate Summaries

  • Go to the Summary tab
  • Select document and summary type:
    • General: Main points for general audience
    • Executive: Business-focused insights
    • Technical: Detailed technical summary
    • Bullet Points: Easy-to-scan format
  • Choose length (100-1000 words)
  • Click "Generate Summary"

4. Monitor Analytics

  • Go to the Analytics tab
  • View document statistics
  • Check system health
  • Monitor chat history

🌐 Access Points

Once running, access the application at:

πŸ” Troubleshooting

Common Issues

"ModuleNotFoundError" or Import Errors

# Ensure you're in the virtual environment
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Reinstall dependencies
pip install -r backend/requirements.txt
pip install -r frontend/requirements.txt

"Port already in use"

# Kill existing processes
lsof -ti:8000 | xargs kill -9  # Backend
lsof -ti:8501 | xargs kill -9  # Frontend

# Or use different ports
streamlit run app.py --server.port 8502

"Cannot connect to backend"

  1. Check if backend is running: http://localhost:8000/health
  2. Ensure both services are running
  3. Check firewall settings

ChromaDB/Vector Store Issues

# Clear vector database if corrupted
rm -rf backend/chroma_db/
rm backend/app_database.db

# Restart application
./run.sh

LLM Configuration Issues

  • OpenAI: Verify API key is valid and has credits
  • Ollama: Ensure Ollama service is running (ollama serve)
  • Check the configuration in Streamlit sidebar

Debug Mode

Enable debug information in the Streamlit sidebar:

  1. Check "Show Debug Info"
  2. Click "Debug Backend Storage" to see document status
  3. Click "Debug Vector Store" to check embeddings

Performance Tips

  • For large documents: Increase chunk size in settings
  • For better results: Use OpenAI models (gpt-4)
  • For privacy: Use local Ollama models
  • For speed: Use smaller models (gpt-3.5-turbo)

πŸ“ Project Structure

ChatWithDocs/
β”œβ”€β”€ run.sh                 # Main startup script
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py           # FastAPI server
β”‚   β”œβ”€β”€ database.py       # SQLite database
β”‚   β”œβ”€β”€ requirements.txt  # Python dependencies
β”‚   β”œβ”€β”€ services/         # Business logic
β”‚   └── models/           # Data models
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ app.py            # Streamlit interface
β”‚   └── requirements.txt  # Frontend dependencies
β”œβ”€β”€ chroma_db/            # Vector database (auto-created)
β”œβ”€β”€ uploads/              # Temporary file storage
└── .gitignore           # Git ignore rules

Environment Variables

Create a .env file in the backend directory:

# OpenAI Configuration
OPENAI_API_KEY=your_api_key_here

# Ollama Configuration
OLLAMA_BASE_URL=http://localhost:11434

# Database Configuration
DATABASE_URL=sqlite:///app_database.db

# Vector Store Configuration
CHROMA_PERSIST_DIRECTORY=./chroma_db

Customizing Settings

Edit configuration in the service files:

  • Chunk size: services/document_processor.py
  • Similarity threshold: services/chat_service.py
  • Model parameters: services/llm_service.py

🎯 Features

  • βœ… Multi-format support: PDF, Word, TXT files
  • βœ… Intelligent chat: RAG-based document interaction
  • βœ… Smart summaries: Multiple summary styles
  • βœ… Dual LLM support: OpenAI API + Local Ollama
  • βœ… Vector search: Semantic similarity matching
  • βœ… Conversation memory: Multi-turn chat history
  • βœ… Source attribution: See which parts of documents were used
  • βœ… Real-time processing: Instant document analysis
  • βœ… Privacy options: Local-only processing with Ollama

About

RAG-based document chat & summary application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published