HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores

HetaRAG is a hybrid, deep-retrieval RAG framework that unifies multiple heterogeneous data stores—vector indices, knowledge graphs, full-text search engines, and relational databases. The knowledge base built on this heterogeneous database enables deep-search question answering within RAG and supports the generation of in-depth research reports. The code currently open-sourced comprises early-stage integrations of exploratory RAG components from our preliminary research; we will continue refining the system design and releasing further code in the future.

🌟 Highlights

2025-09-29 Our paper is available on Arxiv📄!
2025-09-03 Codes are now release!
2025-09-03 Project quick guide, now live here🔗!

✨ Features

Document Parsing: Supports multiple document parsing backends, including MinerU and Docling, for handling complex layouts and multi-modal content.
Knowledge Graph Integration: Automatically extracts entities and relations to build a knowledge graph (HiRAG or LeanRAG).
Flexible Database Support: Integrates with various databases for different needs:
- Vector Stores: Milvus
- Search Engines: Elasticsearch
- Graph Databases: Neo4j
- Relational Databases: MySQL
DeepRetrieval: Supports multiple retrieval paradigms, including Hybrid Retrieval (combining vector search and keyword search), Query Rewrite ,Rerank , and DeepSearch modules.
DeepWriter: A multimodal report generation module. Generate fact-grounded, query-driven reports, with fine-grained citations, from unstructured documents.
Head-ups: Deep-fusion retrieval across heterogeneous stores is pending merge.

🚀 Getting Started

Database Service Configuration

Configure the following four types of database services using Docker, including

Elasticsearch: For full-text search and document indexing.
Milvus: For vector similarity search.
Neo4j: For graph database.
MySQL: For relational database.

These databases can be installed with a single command via Docker. For detailed installation instructions, please refer to the README.

Prerequisites

Python 3.10+
Conda for environment management

Installation

Clone the repository:

git clone https://github.com/your-github-username/hrag.git
cd hrag

Create a virtual environment:

# Upgrade pip and install uv
pip install --upgrade pip
pip install uv

# Create and activate a virtual environment using uv
uv venv h-rag --python=3.10
source h-rag/bin/activate      # For Unix/macOS
h-rag\Scripts\activate         # For Windows

# Alternatively, you can use conda to create and activate the environment
conda create -n h-rag python=3.10
conda activate h-rag

Install the required dependencies:
```
uv pip install -e .
```

💻 Usage

Please refer to the document Read the Docs.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

We utilized the following repos during development:

Citation

If you find our paper and codes useful, please kindly cite us via:

@misc{yan2025hetaraghybriddeepretrievalaugmented,
      title={HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores}, 
      author={Guohang Yan and Yue Zhang and Pinlong Cai and Ding Wang and Song Mao and Hongwei Zhang and Yaoze Zhang and Hairong Zhang and Xinyu Cai and Botian Shi},
      year={2025},
      eprint={2509.21336},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2509.21336}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
docker		docker
scripts		scripts
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores

🌟 Highlights

✨ Features

🚀 Getting Started

Database Service Configuration

Prerequisites

Installation

💻 Usage

📄 License

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

KnowledgeXLab/HetaRAG

Folders and files

Latest commit

History

Repository files navigation

HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores

🌟 Highlights

✨ Features

🚀 Getting Started

Database Service Configuration

Prerequisites

Installation

💻 Usage

📄 License

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages