🧠 Mindoc — Private Offline RAG Assistant

Mindoc is a privacy-focused, fully offline AI assistant that allows you to search and chat with your PDFs and PPTs locally. No cloud, no API keys — everything runs 100% on your device using efficient Small Language Models (SLMs).

🚀 Key Features

🔒 100% Offline

No OpenAI
No cloud dependencies
No data leaves your device
All models stored locally

📄 Multi-Document Ingestion

Upload multiple PDFs and PPTs
Fully private, local processing
Fast and accurate extraction

🧠 Local LLM

Powered by LaMini-Flan-T5 (248M) optimized for CPU inference.

🔎 Hybrid Search Engine

Semantic Vector Search + Cross-Encoder Reranking

Vector Model: all-MiniLM-L6-v2
Reranker: ms-marco-MiniLM-L12-v2

⚡ Dual Search Modes

Quick Mode: Fast, short answers (Top-2 docs)
Deep Research: Multi-doc reasoning using Map-Reduce

🔗 Smart Citations

Evidence-based answers
Click on a citation → open PDF → auto-scroll to exact page

🛠️ Technical Architecture

1. Ingestion Pipeline

Loader: PyMuPDFLoader
Chunking: RecursiveCharacterTextSplitter
- Chunk size: 1000 chars
- Overlap: 200 chars
Embeddings: SentenceTransformer (384-dim)
Storage: ChromaDB (Local persistence)

2. Retrieval & Generation

Retrieve top-10 chunks with vector search
Re-rank with cross-encoder, keep best 3
Feed context → LaMini LLM → generate answer

🏗️ System Architecture

       ┌────────────┐
       │   Files    │
       │ PDF / PPTX │
       └──────┬─────┘
              │
              ▼
      ┌────────────────┐
      │ Document Loader│
      └───────┬────────┘
              │Chunks
              ▼
  ┌────────────────────────┐
  │Embeddings (Local Model)│
  └───────────┬────────────┘
              │Vectors
              ▼
    ┌──────────────────┐
    │ Vector Store     │
    │ FAISS / Chroma   │
    └─────────┬────────┘
              │
              ▼
    ┌───────────────────┐
    │ RAG Pipeline      │
    │ (Retrieve + LLM)  │
    └─────────┬─────────┘
              │
              ▼
       ┌─────────────┐
       │  FastAPI    │
       │  /query     │
       └─────────────┘

📦 Installation Guide

Prerequisites

Python 3.10+ (3.12 recommended)
Node.js & npm

🔧 Backend Setup (FastAPI)

cd backend

# Virtual environment
python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download models (run once)
python download_model.py
python download_reranker.py

# Start backend server
uvicorn app.main:app --reload

🎨 Frontend Setup (React + Vite)

cd frontend

npm install
npm run dev

🖥️ Usage

Upload Documents

Drag & drop PDFs
Supports batch uploads
Wait for “✅ Indexed” confirmation

Chat with Your Documents

⚡ Quick Mode

Fast
Lightweight
Best for direct questions

🧠 Deep Research

Reads many chunks
Map-Reduce summarization
Great for reports & summaries

Verify Sources

Each answer includes clickable citations
Opens full PDF and auto-scrolls to correct page

📂 Project Structure

mindoc/
├── backend/
│ ├── app/
│ │ ├── api/
│ │ ├── rag/
│ │ ├── services/
│ │ └── main.py
│ ├── data/
│ │ ├── chroma/
│ │ ├── models/
│ │ └── uploads/
│ ├── download_model.py
│ ├── download_reranker.py
│ └── requirements.txt
│
└── frontend/
├── src/
│ ├── App.jsx
│ ├── App.css
│ └── main.jsx
└── package.json

❗ Troubleshooting

sqlite3 errors (Python 3.12)

Install SQLite shim:

pip install pysqlite3-binary

Context Window Errors

Long chunks → crash. Fixed by enabling:

truncation=True

500 Search Errors

Usually caused by a missing reranker model. Run again:

python download_reranker.py

🔮 Future Roadmap

OCR for scanned documents
Model switching (LaMini ↔ Phi-2 ↔ Qwen 0.5B)
Persistent conversation history
Voice mode (offline ASR)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Mindoc — Private Offline RAG Assistant

🚀 Key Features

🔒 100% Offline

📄 Multi-Document Ingestion

🧠 Local LLM

🔎 Hybrid Search Engine

⚡ Dual Search Modes

🔗 Smart Citations

🛠️ Technical Architecture

1. Ingestion Pipeline

2. Retrieval & Generation

🏗️ System Architecture

📦 Installation Guide

Prerequisites

🔧 Backend Setup (FastAPI)

🎨 Frontend Setup (React + Vite)

🖥️ Usage

Upload Documents

Chat with Your Documents

⚡ Quick Mode

🧠 Deep Research

Verify Sources

📂 Project Structure

❗ Troubleshooting

sqlite3 errors (Python 3.12)

Context Window Errors

500 Search Errors

🔮 Future Roadmap

About

Uh oh!

Releases

Packages

Languages

arjun7579/mindoc

Folders and files

Latest commit

History

Repository files navigation

🧠 Mindoc — Private Offline RAG Assistant

🚀 Key Features

🔒 100% Offline

📄 Multi-Document Ingestion

🧠 Local LLM

🔎 Hybrid Search Engine

⚡ Dual Search Modes

🔗 Smart Citations

🛠️ Technical Architecture

1. Ingestion Pipeline

2. Retrieval & Generation

🏗️ System Architecture

📦 Installation Guide

Prerequisites

🔧 Backend Setup (FastAPI)

🎨 Frontend Setup (React + Vite)

🖥️ Usage

Upload Documents

Chat with Your Documents

⚡ Quick Mode

🧠 Deep Research

Verify Sources

📂 Project Structure

❗ Troubleshooting

sqlite3 errors (Python 3.12)

Context Window Errors

500 Search Errors

🔮 Future Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages