🚀 RAG API

Multi-tenant Multimodal Document Intelligent Retrieval System

Enterprise-grade RAG service built on RAG-Anything and LightRAG

Features • Quick Start • Architecture • API Documentation • Deployment

📖 Introduction

RAG API is an enterprise-grade Retrieval-Augmented Generation (RAG) service that combines the powerful document parsing capabilities of RAG-Anything with the efficient knowledge graph retrieval technology of LightRAG, providing intelligent Q&A capabilities for your documents.

🎯 Key Highlights

🏢 Multi-tenant Isolation - Complete tenant data isolation for enterprise multi-tenant scenarios
🎨 Multimodal Parsing - Support for PDF, Word, images and more, with full OCR, tables, and formulas coverage
⚡ High-performance Retrieval - Knowledge graph-based hybrid retrieval with 6-15 second query response
🔄 Flexible Deployment - Support for production and development modes with one-click switching
📦 Ready to Use - One-click Docker deployment, service starts in 3 minutes
🎛️ Multiple Parsing Engines - DeepSeek-OCR (Remote API) + MinerU (Local/Remote API) + Docling (Fast)
🎨 RAG-Anything VLM Enhancement - Three modes (off/selective/full) for deep chart understanding
💾 Task Persistence - Redis storage support, tasks recoverable after container restart/instance rebuild

✨ Features

📄 Document Processing

✅ Multiple Format Support
- PDF, Word, Excel, PPT
- PNG, JPG, WebP images
- TXT, Markdown text
✅ Intelligent Parsing
- Plain text (.txt, .md) → Direct insertion (ultra-fast ~1s, skip parser)
- OCR text recognition
- Structured table extraction
- Mathematical formula recognition
- Layout analysis
✅ RAG-Anything VLM Enhancement 🆕
- off - Markdown only (fastest)
- selective - Selective processing of important charts
- full - Complete context enhancement processing
- Smart filtering: with titles, large size, first page content
- ⚠️ Only supports remote MinerU mode, local mode uses RAG-Anything native methods
✅ Batch Processing
- Up to 100 files per batch
- Async task queue
- Real-time progress tracking

🔍 Intelligent Retrieval

✅ Multi-mode Query
- naive - Vector retrieval (fastest)
- local - Local graph
- global - Global graph
- hybrid - Hybrid retrieval
- mix - Full retrieval (most accurate)
✅ Knowledge Graph
- Automatic entity extraction
- Relationship reasoning
- Semantic understanding
- Context enhancement
✅ External Storage
- DragonflyDB (KV storage + task storage)
- Qdrant (vector storage)
- Memgraph (graph database)
- Task persistence (Redis mode)

🏗️ Architecture

System Architecture Diagram

graph TB
    subgraph "Client Layer"
        Client[Client Application]
        WebUI[Web Interface]
    end
    
    subgraph "API Gateway Layer"
        FastAPI[FastAPI Service]
        Auth[Tenant Authentication]
    end
    
    subgraph "Business Logic Layer"
        TenantMgr[Tenant Manager]
        TaskQueue[Task Queue]
        
        subgraph "Document Processing"
            DeepSeekOCR[DeepSeek-OCR<br/>Fast OCR 80% cases]
            MinerU[MinerU Parser<br/>Complex multimodal]
            Docling[Docling Parser<br/>Fast lightweight]
            FileRouter[Smart Router<br/>Complexity scoring]
        end
        
        subgraph "RAG Engine"
            LightRAG[LightRAG Instance Pool<br/>LRU Cache 50]
            KG[Knowledge Graph Engine]
            Vector[Vector Retrieval Engine]
        end
    end
    
    subgraph "Storage Layer"
        DragonflyDB[(DragonflyDB<br/>KV Storage)]
        Qdrant[(Qdrant<br/>Vector Database)]
        Memgraph[(Memgraph<br/>Graph Database)]
        Local[(Local Files<br/>Temp Storage)]
    end
    
    subgraph "External Services"
        LLM[LLM<br/>Entity Extraction/Generation]
        Embedding[Embedding<br/>Vectorization]
        Rerank[Rerank<br/>Reranking]
    end
    
    Client --> FastAPI
    WebUI --> FastAPI
    FastAPI --> Auth
    Auth --> TenantMgr
    TenantMgr --> TaskQueue
    TenantMgr --> LightRAG
    
    TaskQueue --> FileRouter
    FileRouter --> DeepSeekOCR
    FileRouter --> MinerU
    FileRouter --> Docling
    DeepSeekOCR --> LightRAG
    MinerU --> LightRAG
    Docling --> LightRAG
    
    LightRAG --> KG
    LightRAG --> Vector
    
    KG --> DragonflyDB
    KG --> Memgraph
    Vector --> Qdrant
    LightRAG --> Local
    
    LightRAG --> LLM
    LightRAG --> Embedding
    Vector --> Rerank
    
    style FastAPI fill:#00C7B7
    style LightRAG fill:#FF6B6B
    style DeepSeekOCR fill:#5DADE2
    style MinerU fill:#4ECDC4
    style Docling fill:#95E1D3
    style TenantMgr fill:#F38181

Multi-tenant Architecture

graph TB
    subgraph "Tenant A"
        A_Config[Tenant A Config<br/>Independent API Key]
        A_Instance[LightRAG Instance A<br/>Dedicated LLM/Embedding]
        A_Data[(Tenant A Data<br/>Fully Isolated)]
        A_Config --> A_Instance
        A_Instance --> A_Data
    end

    subgraph "Tenant B"
        B_Config[Tenant B Config<br/>Independent API Key]
        B_Instance[LightRAG Instance B<br/>Dedicated LLM/Embedding]
        B_Data[(Tenant B Data<br/>Fully Isolated)]
        B_Config --> B_Instance
        B_Instance --> B_Data
    end

    subgraph "Tenant C"
        C_Config[Using Global Config]
        C_Instance[LightRAG Instance C<br/>Shared LLM/Embedding]
        C_Data[(Tenant C Data<br/>Fully Isolated)]
        C_Config --> C_Instance
        C_Instance --> C_Data
    end

    Pool[Instance Pool Manager<br/>LRU Cache + Config Isolation]
    Global[Global Config<br/>Default API Key]

    Pool --> A_Instance
    Pool --> B_Instance
    Pool --> C_Instance

    C_Config -.fallback.-> Global

    style Pool fill:#F38181
    style Global fill:#95E1D3
    style A_Config fill:#FFD93D
    style B_Config fill:#FFD93D
    style C_Config fill:#E8E8E8

Core Technology Stack

🔧 Frameworks & Runtime

FastAPI 0.115+
Python 3.11+
Uvicorn
Docker & Docker Compose

🧠 AI & RAG

LightRAG 1.4.9.4
RAG-Anything
MinerU (PDF-Extract-Kit)
Docling

💾 Storage & Database

DragonflyDB（Redis compatible）
Qdrant（Vector Database）
Memgraph（Graph Database）
Local filesystem

🚀 Quick Start

Option 1: One-click Deployment (Recommended)

Suitable for production and testing environments:

# 1. Clone the project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api

# 2. Configure environment variables
cp env.example .env
nano .env  # Fill in your API keys

# 3. Run deployment script
chmod +x deploy.sh
./deploy.sh

# Select deployment mode:
# 1) Production Mode - Standard container deployment
# 2) Development Mode - Code hot-reload

# 4. Verify service
curl http://localhost:8000/

Access Swagger Documentation: http://localhost:8000/docs

Option 2: Docker Compose

Production Mode

# Configure environment variables
cp env.example .env
nano .env

# Start services
docker compose -f docker-compose.yml up -d

# View logs
docker compose -f docker-compose.yml logs -f

Development Mode (Code Hot-reload)

# Start development environment
docker compose -f docker-compose.dev.yml up -d

# Or use quick script
./scripts/dev.sh

# Code changes will auto-reload without restart

Option 3: Local Development

# Install uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Configure environment variables
cp env.example .env
nano .env

# Start services
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Environment Variable Configuration

Minimum configuration (required):

# LLM Configuration (Function-oriented naming)
LLM_API_KEY=your_llm_api_key
LLM_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
LLM_MODEL=ep-xxx-xxx
# LLM_REQUESTS_PER_MINUTE=800        # Rate limit (optional)
# LLM_TOKENS_PER_MINUTE=40000        # Rate limit (optional)
# LLM_MAX_ASYNC=8                    # [Optional, expert mode] Manual concurrency control
#                                    # Auto-calculated when unset: min(RPM, TPM/3500) = 11

# Embedding Configuration (Function-oriented naming)
EMBEDDING_API_KEY=your_embedding_api_key
EMBEDDING_BASE_URL=https://api.siliconflow.cn/v1
EMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B
EMBEDDING_DIM=1024
# EMBEDDING_MAX_ASYNC=32             # [Optional, expert mode] Auto-calculated when unset: 800

# MinerU Mode (Remote recommended)
MINERU_MODE=remote
MINERU_API_TOKEN=your_token
MINERU_HTTP_TIMEOUT=60              # MinerU download timeout (seconds, default 60)
FILE_SERVICE_BASE_URL=http://your-ip:8000

# VLM Chart Enhancement Configuration 🆕
# ⚠️ Note: Only effective in MINERU_MODE=remote
RAG_VLM_MODE=off                    # off / selective / full
RAG_IMPORTANCE_THRESHOLD=0.5        # Importance threshold (selective mode)
RAG_CONTEXT_WINDOW=2                # Context window (full mode)
RAG_CONTEXT_MODE=page               # page / chunk
RAG_MAX_CONTEXT_TOKENS=3000         # Max context tokens

# Task Storage Configuration 🆕
TASK_STORE_STORAGE=redis            # memory / redis (production recommends redis)

# Document Insert Verification Configuration 🆕
DOC_INSERT_VERIFICATION_TIMEOUT=300        # Verification timeout (seconds, default 5 minutes)
DOC_INSERT_VERIFICATION_POLL_INTERVAL=0.5  # Poll interval (seconds, default 500ms)

# Model Call Timeout Configuration 🆕
MODEL_CALL_TIMEOUT=90               # Model call max timeout (seconds, default 90)

⚡ Auto Concurrency Calculation:

LLM: When LLM_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/3500) ≈ 11
Embedding: When EMBEDDING_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/500) ≈ 800
Rerank: When RERANK_MAX_ASYNC is unset, auto-calculated as min(RPM, TPM/500) ≈ 800

✅ Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate to completely avoid 429 errors

See env.example for complete configuration.

📚 API Documentation

Core Endpoints

1️⃣ Upload Document

# Single file upload (default mode)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc1" \
  -F "file=@document.pdf" \
  -F "parser=auto"

# VLM chart enhancement mode 🆕
# off: Markdown only (fastest, default)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc2&vlm_mode=off" \
  -F "file=@document.pdf"

# selective: Selective processing of important charts (balance performance and quality)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc3&vlm_mode=selective" \
  -F "file=@document.pdf"

# full: Complete RAG-Anything processing (highest quality, context enhancement enabled)
curl -X POST "http://localhost:8000/insert?tenant_id=your_tenant&doc_id=doc4&vlm_mode=full" \
  -F "file=@document.pdf"

# Response
{
  "task_id": "task-xxx-xxx",
  "doc_id": "doc1",
  "filename": "document.pdf",
  "vlm_mode": "off",
  "status": "pending"
}

2️⃣ Batch Upload

curl -X POST "http://localhost:8000/batch?tenant_id=your_tenant" \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.docx" \
  -F "files=@image.png"

# Response
{
  "batch_id": "batch-xxx-xxx",
  "total_files": 3,
  "accepted_files": 3,
  "tasks": [...]
}

3️⃣ Intelligent Query (Query API v2.0)

New Advanced Features:

✨ Conversation History: Support for multi-turn conversation context
✨ Custom Prompts: Customize response style
✨ Response Format Control: paragraph/list/json
✨ Keyword Precision Retrieval: hl_keywords/ll_keywords
✨ Streaming Output: Real-time generation viewing

# Basic query
curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the core viewpoints in the document?",
    "mode": "hybrid"
  }'

# Advanced query (multi-turn dialogue + custom prompt)
curl -X POST "http://localhost:8000/query?tenant_id=your_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Can you elaborate on the second point?",
    "mode": "hybrid",
    "conversation_history": [
      {"role": "user", "content": "What are the key points?"},
      {"role": "assistant", "content": "There are mainly three points..."}
    ],
    "user_prompt": "Please answer in professional academic language",
    "response_type": "list"
  }'

# Streaming query (SSE)
curl -N -X POST "http://localhost:8000/query/stream?tenant_id=your_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the core viewpoints in the document?",
    "mode": "hybrid"
  }'

# Response (real-time streaming output)
data: {"chunk": "Based on", "done": false}
data: {"chunk": "document content", "done": false}
data: {"done": true}

4️⃣ Task Status Query

curl "http://localhost:8000/task/task-xxx-xxx?tenant_id=your_tenant"

# Response
{
  "task_id": "task-xxx-xxx",
  "status": "completed",
  "progress": 100,
  "result": {...}
}

5️⃣ Tenant Management

# Get tenant statistics
curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant"

# Clear tenant cache
curl -X DELETE "http://localhost:8000/tenants/cache?tenant_id=your_tenant"

# View instance pool status (admin)
curl "http://localhost:8000/tenants/pool/stats"

VLM Mode Comparison 🆕

Mode	Speed	Quality	Resource Usage	Use Case
`off`	⚡⚡⚡⚡⚡	⭐⭐⭐	Very Low	Plain text documents, fast batch processing
`selective`	⚡⚡⚡⚡	⭐⭐⭐⭐	Low	Documents with key charts (recommended)
`full`	⚡⚡	⭐⭐⭐⭐⭐	High	Chart-intensive research reports, papers

Processing Time Estimate (20-page PDF example):

off: ~10 seconds(Markdown only)
selective: ~30 seconds(5-10 important charts)
full: ~120 seconds(complete context processing)

Query Mode Comparison

Mode	Speed	Accuracy	Use Case
`naive`	⚡⚡⚡⚡⚡	⭐⭐⭐	Simple Q&A, fast retrieval
`local`	⚡⚡⚡⚡	⭐⭐⭐⭐	Local entity relationship queries
`global`	⚡⚡⚡	⭐⭐⭐⭐	Global knowledge graph reasoning
`hybrid`	⚡⚡⚡	⭐⭐⭐⭐⭐	Hybrid retrieval (recommended)
`mix`	⚡⚡	⭐⭐⭐⭐⭐	Complex questions, deep analysis

Query API v2.0 Advanced Parameters

Parameter	Type	Description	Example
`conversation_history`	List[Dict]	Multi-turn conversation context	`[{"role": "user", "content": "..."}]`
`user_prompt`	str	Custom prompt	"Please answer in professional academic language"
`response_type`	str	Response format	"paragraph", "list", "json"
`hl_keywords`	List[str]	High priority keywords	`["artificial intelligence", "machine learning"]`
`ll_keywords`	List[str]	Low priority keywords	`["application", "case study"]`
`only_need_context`	bool	Return context only (debug)	`true`
`max_entity_tokens`	int	Entity token limit	`6000`

Complete API documentation:http://localhost:8000/docs

🎯 Usage Examples

Python SDK

import requests

# Configuration
BASE_URL = "http://localhost:8000"
TENANT_ID = "your_tenant"

# Upload document
with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/insert",
        params={"tenant_id": TENANT_ID, "doc_id": "doc1"},
        files={"file": f}
    )
    task_id = response.json()["task_id"]
    print(f"Task ID: {task_id}")

# Query
response = requests.post(
    f"{BASE_URL}/query",
    params={"tenant_id": TENANT_ID},
    json={
        "query": "What is the main content of the document?",
        "mode": "hybrid",
        "top_k": 10
    }
)
result = response.json()
print(f"Answer: {result['answer']}")

Complete cURL Example

# 1. Upload PDF document
TASK_ID=$(curl -X POST "http://localhost:8000/insert?tenant_id=demo&doc_id=report" \
  -F "file=@report.pdf" | jq -r '.task_id')

echo "Task ID: $TASK_ID"

# 2. Wait for processing completion
while true; do
  STATUS=$(curl -s "http://localhost:8000/task/$TASK_ID?tenant_id=demo" | jq -r '.status')
  echo "Status: $STATUS"
  if [ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ]; then
    break
  fi
  sleep 2
done

# 3. Query document content
curl -X POST "http://localhost:8000/query?tenant_id=demo" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main conclusions of this report?",
    "mode": "hybrid"
  }' | jq '.answer'

🛠️ Deployment

System Requirements

Minimum Configuration:

CPU: 2 cores
RAM: 4GB
Disk: 40GB SSD
OS: Ubuntu 20.04+ / Debian 11+ / CentOS 8+

Recommended Configuration (Production):

CPU: 4 cores
RAM: 8GB
Disk: 100GB SSD
OS: Ubuntu 22.04 LTS

Server Deployment

Quick Deployment on Aliyun/Tencent Cloud

# SSH login to server
ssh root@your-server-ip

# Clone project
git clone https://github.com/BukeLy/rag-api.git
cd rag-api

# Run one-click deployment script
chmod +x deploy.sh
./deploy.sh

# The script will automatically:
# 1. Install Docker and Docker Compose
# 2. Configure environment variables
# 3. Optimize system parameters
# 4. Start services
# 5. Verify health status

External Storage Configuration

Supports DragonflyDB + Qdrant + Memgraph external storage (enabled by default):

# Configure in .env
USE_EXTERNAL_STORAGE=true

# DragonflyDB configuration (KV Storage)
KV_STORAGE=RedisKVStorage
REDIS_URI=redis://dragonflydb:6379/0

# Qdrant configuration (vector storage)
VECTOR_STORAGE=QdrantVectorDBStorage
QDRANT_URL=http://qdrant:6333

# Memgraph configuration (graph storage)
GRAPH_STORAGE=MemgraphStorage
MEMGRAPH_URI=bolt://memgraph:7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=

See External Storage Deployment Documentation。

Docker Compose Configuration

The project provides two configuration files:

File	Purpose	Features
`docker-compose.yml`	Production mode	Code packaged in image, optimal performance
`docker-compose.dev.yml`	Development mode	Code mounted externally, supports hot-reload

Select configuration file:

# Production mode
docker compose -f docker-compose.yml up -d

# Development mode
docker compose -f docker-compose.dev.yml up -d

Performance Optimization

Tuning Parameters

Configure in .env:

# ⚡ Concurrency Control (Recommended: use auto-calculation)
# LLM_MAX_ASYNC=8                    # [Expert mode] Manually specify LLM concurrency
#                                    # Auto-calculated when unset: min(RPM, TPM/3500) ≈ 11
# EMBEDDING_MAX_ASYNC=32             # [Expert mode] Manually specify Embedding concurrency
#                                    # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800
# RERANK_MAX_ASYNC=16                # [Expert mode] Manually specify Rerank concurrency
#                                    # Auto-calculated when unset: min(RPM, TPM/500) ≈ 800

# Retrieval count (affects query quality and speed)
TOP_K=20  # Entity/relationship retrieval count
CHUNK_TOP_K=10  # Text chunk retrieval count

# Document processing concurrency
DOCUMENT_PROCESSING_CONCURRENCY=10  # Remote mode can be set high, local mode set to 1

🎯 Concurrency Configuration Recommendations:

Recommended: Don't set *_MAX_ASYNC, let the system auto-calculate based on TPM/RPM
Expert mode: If manual control needed, can set LLM_MAX_ASYNC and other parameters
Advantage: Auto-calculation completely avoids 429 errors (TPM limit reached)

Mode Selection

MinerU Remote Mode (Recommended): High concurrency, resource-efficient
MinerU Local Mode: Requires GPU, high memory usage
Docling Mode: Fast and lightweight, suitable for simple documents

🏢 Multi-tenant Usage

Tenant Isolation

Each tenant has:

✅ Independent LightRAG instance
✅ Isolated data storage space
✅ Independent vector index
✅ Dedicated knowledge graph
✅ Independent service configuration (LLM, Embedding, Rerank, DeepSeek-OCR, MinerU)🆕

Tenant Configuration Management 🆕

Each tenant can independently configure 5 services with hot-reload support:

# 1️⃣ Configure independent DeepSeek-OCR API key for Tenant A
curl -X PUT "http://localhost:8000/tenants/tenant_a/config" \
  -H "Content-Type: application/json" \
  -d '{
    "ds_ocr_config": {
      "api_key": "sk-tenant-a-ds-ocr-key",
      "base_url": "https://api.siliconflow.cn/v1",
      "model": "deepseek-ai/DeepSeek-OCR",
      "timeout": 90
    }
  }'

# 2️⃣ Configure independent MinerU API token for Tenant B
curl -X PUT "http://localhost:8000/tenants/tenant_b/config" \
  -H "Content-Type: application/json" \
  -d '{
    "mineru_config": {
      "api_token": "tenant-b-mineru-token",
      "base_url": "https://mineru.net",
      "model_version": "vlm"
    }
  }'

# 3️⃣ Configure multiple services simultaneously (LLM + Embedding + DeepSeek-OCR)
curl -X PUT "http://localhost:8000/tenants/tenant_c/config" \
  -H "Content-Type: application/json" \
  -d '{
    "llm_config": {
      "api_key": "sk-tenant-c-llm-key",
      "model": "gpt-4"
    },
    "embedding_config": {
      "api_key": "sk-tenant-c-embedding-key",
      "model": "Qwen/Qwen3-Embedding-0.6B",
      "dim": 1024
    },
    "ds_ocr_config": {
      "api_key": "sk-tenant-c-ds-ocr-key"
    }
  }'

# 4️⃣ Query tenant configuration (API key auto-masked)
curl "http://localhost:8000/tenants/tenant_a/config"

# Response example
{
  "tenant_id": "tenant_a",
  "ds_ocr_config": {
    "api_key": "sk-***-key",  // Auto-masked
    "timeout": 90
  },
  "merged_config": {
    "llm": {...},        // Using Global Config
    "embedding": {...},  // Using Global Config
    "rerank": {...},     // Using Global Config
    "ds_ocr": {...},     // Using tenant config
    "mineru": {...}      // Using Global Config
  }
}

# 5️⃣ Refresh config cache (config hot-reload)
curl -X POST "http://localhost:8000/tenants/tenant_a/config/refresh"

# 6️⃣ Delete tenant config (restore to global config)
curl -X DELETE "http://localhost:8000/tenants/tenant_a/config"

Supported Configuration Items:

Service	Config Field	Description
LLM	`llm_config`	Model, API key, base_url, etc.
Embedding	`embedding_config`	Model, API key, dimension, etc.
Rerank	`rerank_config`	Model, API key, etc.
DeepSeek-OCR	`ds_ocr_config`	API key, timeout, mode, etc.
MinerU	`mineru_config`	API token, version, timeout, etc.

Configuration Priority: Tenant config > Global config

Use Cases:

🔐 Multi-tenant SaaS: Each tenant uses their own API key
💰 Pay-per-use: Track tenant usage through independent API keys
🎯 Differentiated Services: Different tenants use different models (GPT-4 vs GPT-3.5)
🧪 A/B Testing: Compare different models/parameters

Usage

All APIs require tenant_id parameter:

# Tenant A upload document
curl -X POST "http://localhost:8000/insert?tenant_id=tenant_a&doc_id=doc1" \
  -F "file=@doc.pdf"

# Tenant B upload document (fully isolated)
curl -X POST "http://localhost:8000/insert?tenant_id=tenant_b&doc_id=doc1" \
  -F "file=@doc.pdf"

# Tenant A query (can only query own documents)
curl -X POST "http://localhost:8000/query?tenant_id=tenant_a" \
  -H "Content-Type: application/json" \
  -d '{"query": "document content", "mode": "hybrid"}'

Instance Pool Management

Capacity: Cache up to 50 tenant instances
Strategy: LRU (Least Recently Used) automatic cleanup
Config Isolation: Each tenant can use independent LLM, Embedding, parser configuration

📊 Monitoring & Maintenance

Common Commands

# View service status
docker compose ps

# View real-time logs
docker compose logs -f

# Restart services
docker compose restart

# Stop services
docker compose down

# View resource usage
docker stats

# Clean Docker resources
docker system prune -f

Maintenance Scripts

# Monitor service health
./scripts/monitor.sh

# Backup data
./scripts/backup.sh

# Update services
./scripts/update.sh

# Performance testing
./scripts/test_concurrent_perf.sh

# Performance monitoring
./scripts/monitor_performance.sh

Health Checks

# Complete health check (recommended)
./scripts/health_check.sh
./scripts/health_check.sh --verbose  # verbose output

# API health check
curl http://localhost:8000/

# Tenant statistics
curl "http://localhost:8000/tenants/stats?tenant_id=your_tenant"

# Instance pool status
curl "http://localhost:8000/tenants/pool/stats"

🗂️ Project Structure

rag-api/
├── main.py                 # FastAPI application entry
├── api/                    # API route modules
│   ├── __init__.py         # Route aggregation
│   ├── insert.py           # Document upload (single/batch)
│   ├── query.py            # Intelligent query
│   ├── task.py             # Task status query
│   ├── tenant.py           # Tenant management
│   ├── files.py            # File service
│   ├── models.py           # Pydantic models
│   └── task_store.py       # Task storage
├── src/                    # Core business logic
│   ├── rag.py              # LightRAG lifecycle management
│   ├── multi_tenant.py     # Multi-tenant instance manager
│   ├── tenant_deps.py      # Tenant dependency injection
│   ├── logger.py           # Unified logging
│   ├── metrics.py          # Performance metrics
│   ├── file_url_service.py # Temporary file service
│   ├── mineru_client.py    # MinerU client
│   └── mineru_result_processor.py  # Result processing
├── docs/                   # Documentation
│   ├── ARCHITECTURE.md     # Architecture design documentation
│   ├── USAGE.md            # Detailed usage guide
│   ├── DEPLOY_MODES.md     # Deployment mode description
│   ├── PR_WORKFLOW.md      # PR workflow
│   └── ...
├── scripts/                # Maintenance scripts
│   ├── dev.sh              # Development mode quick start
│   ├── monitor.sh          # Service monitoring
│   ├── backup.sh           # Data backup
│   ├── update.sh           # Service update
│   └── ...
├── deploy.sh               # One-click deployment script
├── docker-compose.yml      # Production mode configuration
├── docker-compose.dev.yml  # Development mode configuration
├── Dockerfile              # Production image
├── Dockerfile.dev          # Development image
├── pyproject.toml          # Project dependencies
├── uv.lock                 # Dependency lock
├── env.example             # Environment variable template
├── CLAUDE.md               # Claude AI guide
└── README.md               # This documentation

🐛 Troubleshooting

Common Issues

Q1: What to do if service fails to start?

# View detailed logs
docker compose logs

# Check port usage
netstat -tulpn | grep 8000

# Check Docker status
docker ps -a

Q2: multimodal_processed error?

Note: This issue has been fixed in LightRAG 1.4.9.4+. If you encounter this error, your version is outdated.

Solution:

# Option 1: Upgrade to latest version (recommended)
# Modify LightRAG version in pyproject.toml
# lightrag = "^1.4.9.4"

# Rebuild image
docker compose down
docker compose up -d --build

# Option 2: Clean old data (temporary solution)
rm -rf ./rag_local_storage
docker compose restart

Q3: File upload returns 400 error?

Check:

File format supported (PDF, DOCX, PNG, JPG, etc.)
File size exceeds 100MB
File is empty

# View supported formats
curl http://localhost:8000/docs

Q3.5: Embedding dimension error?

If you encounter dimension-related errors, need to clean data and rebuild:

# Stop services
docker compose down

# Delete all volumes (clear database)
docker volume rm rag-api_dragonflydb_data rag-api_qdrant_data rag-api_memgraph_data

# Modify EMBEDDING_DIM in .env
EMBEDDING_DIM=1024  # or 4096, must match the model

# Restart
docker compose up -d

Q4: Query is very slow (>30 seconds)?

Optimization suggestions:

Use naive or hybrid mode instead of mix
Increase MAX_ASYNC parameter (in .env)
Reduce TOP_K and CHUNK_TOP_K
Enable Reranker

# Modify .env
MAX_ASYNC=8
TOP_K=20
CHUNK_TOP_K=10

Q5: Out of memory (OOM)?

If using local MinerU:

# Switch to remote mode
# Modify in .env
MINERU_MODE=remote
MINERU_API_TOKEN=your_token

# Or limit concurrency
DOCUMENT_PROCESSING_CONCURRENCY=1

Q6: Tasks lost after container restart?

Problem Symptoms:

Cannot query previous task status after container restart
Tasks disappear after tenant instance evicted by LRU

Solution: Enable Redis task storage

# Modify .env
TASK_STORE_STORAGE=redis

# Restart services
docker compose restart

# Verify
docker compose logs api | grep TaskStore
# Should see: ✅ TaskStore: Redis connection successful

Configuration Description:

memory mode: In-memory storage, data lost after restart (default, suitable for development)
redis mode: Persistent storage, supports container restart and instance rebuild (production recommended)

TTL Strategy (Redis mode auto-cleanup):

completed tasks: 24 hours
failed tasks: 24 hours
pending/processing tasks: 6 hours

Q7: VLM mode processing failed?

Check Items:

vision_model_func not configured
- Check logs:vision_model_func not found, fallback to off mode
- Ensure LLM API is configured in .env
Image file does not exist
- Check logs:Image file not found: xxx
- Possibly corrupted MinerU ZIP or extraction failed
Timeout error
- full mode may timeout on large files
- Suggestion: Use selective mode first, or increase VLM_TIMEOUT

# Modify .env
VLM_TIMEOUT=300  # Increase to 5 minutes
RAG_VLM_MODE=selective  # downgrade to selective

Debugging Tips:

# View detailed logs
docker compose logs -f | grep VLM

# Test single file
curl -X POST 'http://localhost:8000/insert?tenant_id=test&doc_id=test&vlm_mode=off' \
  -F 'file=@test.pdf'

Performance Tuning Recommendations

Scenario	MAX_ASYNC	TOP_K	CHUNK_TOP_K	MINERU_MODE
Fast response	8	10	5	remote
Balanced mode	8	20	10	remote
High accuracy	4	60	20	remote
Resource limited	4	20	10	remote

📖 Documentation

📘 Architecture Design Documentation - Detailed system architecture and design concepts
📗 Usage Guide - Complete API usage documentation and examples
📙 Deployment Mode Description - Production mode vs Development mode
📕 PR Workflow - Process guide for code contribution
📔 External Storage Deployment - Redis/PostgreSQL/Neo4j configuration
📊 API Comparison Analysis - rag-api vs LightRAG official API comparison
🌐 WebUI Integration Guide - Knowledge graph visualization integration

🤝 Contributing

We welcome all forms of contribution!

How to Contribute

Fork the project

git clone https://github.com/BukeLy/rag-api.git
cd rag-api

Create feature branch

git checkout -b feature/your-feature-name

Development and Testing

# Install dependencies
uv sync

# Run tests
uv run pytest

# Code formatting
uv run black .
uv run isort .

Submit code

git add .
git commit -m "feat: Add new feature"
git push origin feature/your-feature-name

Create Pull Request

Create a PR on GitHub with detailed description of your changes.

Commit Conventions

Use semantic commit messages:

feat: New feature
fix: Bug fix
docs: Documentation update
style: Code formatting
refactor: Code refactoring
perf: Performance optimization
test: Testing
chore: Build/tools

See PR Workflow Documentation。

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments

This project is built on the following excellent open source projects:

LightRAG - Efficient knowledge graph RAG framework
RAG-Anything - Multimodal document parsing
MinerU - Powerful PDF parsing tool
Docling - Lightweight document parsing
FastAPI - Modern Python web framework

Special thanks to all contributors and users for their support! 🎉

📬 Contact Us

GitHub: @BukeLy
Email: buledream233@gmail.com
Issues: Submit Issue
Discussions: Join Discussion

⭐ If this project helps you, please give it a Star!

Made with ❤️ by BukeLy

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
api		api
deploy		deploy
docs		docs
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
Dockerfile.qdrant		Dockerfile.qdrant
README.md		README.md
README.zh-CN.md		README.zh-CN.md
deploy.sh		deploy.sh
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
env.example		env.example
main.py		main.py
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
test_lightrag_doc_status_api.py		test_lightrag_doc_status_api.py
uv.lock		uv.lock

BukeLy/rag-api

Folders and files

Latest commit

History

Repository files navigation

🚀 RAG API

📖 Introduction

🎯 Key Highlights

✨ Features

📄 Document Processing

🔍 Intelligent Retrieval

🏗️ Architecture

System Architecture Diagram

Multi-tenant Architecture

Core Technology Stack

🚀 Quick Start

Option 1: One-click Deployment (Recommended)

Option 2: Docker Compose

Production Mode

Development Mode (Code Hot-reload)

Option 3: Local Development

Environment Variable Configuration

📚 API Documentation

Core Endpoints

1️⃣ Upload Document

2️⃣ Batch Upload

3️⃣ Intelligent Query (Query API v2.0)

4️⃣ Task Status Query

5️⃣ Tenant Management

VLM Mode Comparison 🆕

Query Mode Comparison

Query API v2.0 Advanced Parameters

🎯 Usage Examples

Python SDK

Complete cURL Example

🛠️ Deployment

System Requirements

Server Deployment

Quick Deployment on Aliyun/Tencent Cloud

External Storage Configuration

Docker Compose Configuration

Performance Optimization

Tuning Parameters

Mode Selection

🏢 Multi-tenant Usage

Tenant Isolation

Tenant Configuration Management 🆕

Usage

Instance Pool Management

📊 Monitoring & Maintenance

Common Commands

Maintenance Scripts

Health Checks

🗂️ Project Structure

🐛 Troubleshooting

Common Issues

Performance Tuning Recommendations

📖 Documentation

🤝 Contributing

How to Contribute

Commit Conventions

📄 License

🙏 Acknowledgments

📬 Contact Us

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages