Skip to content

KnowledgeXLab/HetaRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores

Custom badge Python Version License: MIT Read the Docs Stars GitHub issues PRs Welcome

HetaRAG is a hybrid, deep-retrieval RAG framework that unifies multiple heterogeneous data stores—vector indices, knowledge graphs, full-text search engines, and relational databases. The knowledge base built on this heterogeneous database enables deep-search question answering within RAG and supports the generation of in-depth research reports. The code currently open-sourced comprises early-stage integrations of exploratory RAG components from our preliminary research; we will continue refining the system design and releasing further code in the future.

🌟 Highlights

  • 2025-09-29 Our paper is available on Arxiv📄!
  • 2025-09-03 Codes are now release!
  • 2025-09-03 Project quick guide, now live here🔗!

✨ Features

  • Document Parsing: Supports multiple document parsing backends, including MinerU and Docling, for handling complex layouts and multi-modal content.

  • Knowledge Graph Integration: Automatically extracts entities and relations to build a knowledge graph (HiRAG or LeanRAG).

  • Flexible Database Support: Integrates with various databases for different needs:

    • Vector Stores: Milvus
    • Search Engines: Elasticsearch
    • Graph Databases: Neo4j
    • Relational Databases: MySQL
  • DeepRetrieval: Supports multiple retrieval paradigms, including Hybrid Retrieval (combining vector search and keyword search), Query Rewrite ,Rerank , and DeepSearch modules.

  • DeepWriter: A multimodal report generation module. Generate fact-grounded, query-driven reports, with fine-grained citations, from unstructured documents.

  • Head-ups: Deep-fusion retrieval across heterogeneous stores is pending merge.Pending

🚀 Getting Started

Database Service Configuration

Configure the following four types of database services using Docker, including

  • Elasticsearch: For full-text search and document indexing.
  • Milvus: For vector similarity search.
  • Neo4j: For graph database.
  • MySQL: For relational database.

These databases can be installed with a single command via Docker. For detailed installation instructions, please refer to the README.

Prerequisites

  • Python 3.10+
  • Conda for environment management

Installation

  1. Clone the repository:

    git clone https://github.com/your-github-username/hrag.git
    cd hrag
  2. Create a virtual environment:

    # Upgrade pip and install uv
    pip install --upgrade pip
    pip install uv
    
    # Create and activate a virtual environment using uv
    uv venv h-rag --python=3.10
    source h-rag/bin/activate      # For Unix/macOS
    h-rag\Scripts\activate         # For Windows
    
    # Alternatively, you can use conda to create and activate the environment
    conda create -n h-rag python=3.10
    conda activate h-rag
  3. Install the required dependencies:

    uv pip install -e .

💻 Usage

Please refer to the document Read the Docs.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

We utilized the following repos during development:

Citation

If you find our paper and codes useful, please kindly cite us via:

@misc{yan2025hetaraghybriddeepretrievalaugmented,
      title={HetaRAG: Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores}, 
      author={Guohang Yan and Yue Zhang and Pinlong Cai and Ding Wang and Song Mao and Hongwei Zhang and Yaoze Zhang and Hairong Zhang and Xinyu Cai and Botian Shi},
      year={2025},
      eprint={2509.21336},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2509.21336}, 
}

About

Hybrid Deep Retrieval-Augmented Generation across Heterogeneous Data Stores

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •