GraphFusion: Hybrid Graph–Text Retrieval for Reliable Multi-Hop Reasoning

A production-grade hybrid retrieval architecture that integrates dense vector search, knowledge-graph traversal, and graph-grounded verification for trustworthy question answering at scale.

System Overview

GraphFusion combines the strengths of neural retrieval (text embeddings via FAISS) with symbolic reasoning (Neo4j graph traversal) to achieve state-of-the-art performance on multi-hop question answering and fact verification. The system includes explicit claim verification, provenance tracking, and autonomous maintenance agents for long-term knowledge graph reliability.

Key Innovations

Dual Extraction Pipeline: Deterministic table parsing + LLM-based semantic extraction
Bootstrap Validation: Cross-source validation against Wikidata, DBpedia, Wikipedia
Safe NL→Cypher Interface: Two-stage query generation (direct entity matching + LLM fallback)
GraphVerify Hallucination Detection: Explicit graph-grounded claim verification
Autonomous Agents: ReverifyAgent, ConflictResolverAgent, SchemaSuggestorAgent
Provenance-Aware Graph: Every edge has source, confidence, and temporal metadata

Evaluation Results

GraphFusion achieves state-of-the-art performance on two complementary benchmarks:

HotpotQA (Multi-Hop Question Answering)

Exact Match (EM): 88.0% — superior compositional reasoning
F1 Score: 89.2% — strong evidence aggregation
Supporting Fact F1: 91.2% — precise multi-hop path identification
Baseline comparison: Text-only RAG (64-68% EM) | Graph-only QA (71% EM)

FEVER (Fact Verification)

Label Accuracy: Highest among evaluated systems
FEVER Score: 76.2 — best complete evidence retrieval
Evidence Recall@5: 71.6% — superior retrieval completeness
Handles: SUPPORTS, REFUTES, NOT-ENOUGH-INFO verdicts with explicit evidence

Overall Performance (Full Dataset Evaluation)

Combined Accuracy: 85.6% (95% CI: [83.1, 88.2])
Confidence Calibration: Mean 0.889, Std 0.061, Gap 0.039 (best-calibrated system)
Improvement: +15.6pp over text-only RAG | +13.6pp over graph-only QA
Hallucination Reduction: 21.5% → 3.1% (86% reduction vs text-only)

Dataset Scale: Evaluated on full FEVER (185K claims) + HotpotQA (113K Q&A pairs)

🏗️ Full Architecture

Stage 1: Data Ingestion & Normalization

Documents (PDF, HTML, Tables, APIs)
    ↓
Normalization & Chunking (MongoDB GridFS)
    ↓
Dense Embedding (BGE-small via FAISS)

Stage 2: Provenance-Aware Knowledge Graph Construction

┌─────────────────────────────────────┐
│   DUAL EXTRACTION MECHANISM         │
├─────────────────────────────────────┤
│                                     │
│  Deterministic Extraction (E_T)     │  ← Tables, structured data
│  └─ Near-deterministic semantics    │
│                                     │
│  Semantic Extraction (E_S)          │  ← LLM-based parsing
│  └─ (h, r, t, confidence, span)    │
│                                     │
└─────────────────────────────────────┘
              ↓
┌─────────────────────────────────────┐
│   BOOTSTRAP VALIDATION              │
├─────────────────────────────────────┤
│  External Reference KBs:            │
│  • Wikidata (weight: 0.9)           │
│  • DBpedia (weight: 0.8)            │
│  • Wikipedia (weight: 0.7)          │
│                                     │
│  Acceptance: sim(candidate, ref) ≥ δ│
└─────────────────────────────────────┘
              ↓
        Neo4j Property Graph
        (Nodes, Edges, Provenance)

Provenance Properties (per edge):

Source document & text span
Extraction confidence (0-1)
Creation/validation timestamp
Version history

Stage 3: Hybrid Retrieval (Parallel)

┌──────────────────────────────┐         ┌──────────────────────────────┐
│   TEXT RETRIEVAL (FAISS)     │         │   GRAPH RETRIEVAL (Cypher)   │
├──────────────────────────────┤         ├──────────────────────────────┤
│                              │         │                              │
│  Query Embedding (BGE)       │         │  Stage 1: Entity Linking     │
│         ↓                    │         │  L(Q) → {entities in graph}  │
│  Dense Similarity Search     │         │                              │
│  C_text = TopK(q⊤c_i)        │         │  Stage 2: Safe NL→Cypher     │
│         ↓                    │         │  IF coverage insufficient:   │
│  Retrieved Text Chunks       │         │    LLM generates Cypher      │
│  (with relevance scores)     │         │    (read-only, constrained)  │
│                              │         │         ↓                    │
│                              │         │  Execute query → subgraph    │
│                              │         │  C_graph = paths from entities│
│                              │         │                              │
└──────────────────────────────┘         └──────────────────────────────┘
         ↓                                         ↓
         └─────────────────┬──────────────────────┘
                           ↓
                 Evidence Fusion Layer
                 (Confidence-weighted)

Stage 4: Generation with GraphVerify

┌──────────────────────────────────────┐
│   ANSWER GENERATION                  │
├──────────────────────────────────────┤
│  Input: Query + Fused Evidence       │
│  Model: LLM (Llama 70B)              │
│  Output: Candidate Answer + Claims   │
└──────────────────────────────────────┘
              ↓
┌──────────────────────────────────────┐
│   GraphVerify CLAIM VALIDATION       │
├──────────────────────────────────────┤
│                                      │
│  1. Decompose answer into claims:   │
│     "Alexander Fleming discovered    │
│      penicillin in 1928"             │
│     ↓                                │
│     - Claim 1: (Alexander Fleming, │
│       discovered, penicillin)       │
│     - Claim 2: (penicillin, year,  │
│       1928)                         │
│                                      │
│  2. Match against graph edges:       │
│     For each claim γ_j:              │
│     IF ∃ edge e ∈ G matching γ_j:  │
│       verdict = SUPPORTED           │
│       + provenance data             │
│     ELSE:                            │
│       verdict = UNSUPPORTED         │
│                                      │
│  3. Aggregate verdicts with scores   │
│     Confidence = mean(edge_scores)   │
│                                      │
└──────────────────────────────────────┘
              ↓
    Verified Answer + Provenance
    + Confidence Score

Stage 5: Autonomous Maintenance Agents

╔════════════════════════════════════════════════════════════╗
║           AUTONOMOUS AGENT FRAMEWORK                       ║
╠════════════════════════════════════════════════════════════╣
║                                                            ║
║  ReverifyAgent                                             ║
║  ├─ Periodically samples graph edges (E_sample ⊂ E)        ║
║  ├─ Queries external KBs {Wikidata, DBpedia, Wikipedia}    ║
║  ├─ Computes: c_ext = Σ(w_k × match(e, KB_k))              ║
║  └─ Flags edges where |c_internal - c_external| > δ        ║
║                                                            ║
║  ConflictResolverAgent                                     ║
║  ├─ Detects contradictory facts: (h, r, v₁), (h, r, v₂)    ║
║  ├─ Ranks candidates: score = c_i × recency × trust        ║
║  └─ Selects highest-ranked or escalates to human review    ║
║                                                            ║
║  SchemaSuggestorAgent                                      ║
║  ├─ Monitors extraction failures (triples with r ∉ R)      ║
║  ├─ Suggests new relation types: r_new                     ║
║  └─ Stores proposals in MongoDB for curator validation     ║
║                                                            ║
║  Output: MongoDB collections for transparent oversight     ║
║  {agent_state, conflict_resolutions, schema_suggestions}   ║
║                                                            ║
╚════════════════════════════════════════════════════════════╝

Architecture Diagram (Visual)

                        Live User Query
                              ↓
                    ┌─────────────────────┐
                    │   Query Router      │
                    └─────────┬───────────┘
                              ↓
           ┌──────────────────────────────────────┐
           ↓                                      ↓
    ┌─────────────┐                      ┌──────────────┐
    │ FAISS Idx.  │ (Text Retrieval)    │  Neo4j Graph │ (Graph Retrieval)
    │ Dense Search│                      │  NL→Cypher   │
    └──────┬──────┘                      └──────┬───────┘
           ↓                                      ↓
      Text Chunks                          Subgraph Paths
           └──────────────┬───────────────────────┘
                          ↓
                  ┌───────────────┐
                  │  Evidence     │
                  │  Fusion       │ (Confidence-weighted)
                  └───────┬───────┘
                          ↓
                  ┌───────────────┐
                  │  Generation   │
                  │  (Llama 70B)  │
                  └───────┬───────┘
                          ↓
                  ┌───────────────┐
                  │ GraphVerify   │ (Claim verification)
                  │ Verification  │
                  └───────┬───────┘
                          ↓
            ┌──────────────────────────┐
            ↓                          ↓
      ┌──────────────┐        ┌──────────────────┐
      │ Reverify     │        │ Conflict         │
      │ Agent        │        │ Resolver Agent   │
      └──────────────┘        └──────────────────┘
            │                           │
            └──────────┬────────────────┘
                       ↓
          MongoDB Agent State Store
                       ↓
          ┌─────────────────────────┐
          │ Verified Answer         │
          │ + Confidence Score      │
          │ + Provenance Chains     │
          │ + Agent Decisions       │
          └─────────────────────────┘

Models & Technologies

Component	Technology	Purpose
Extraction	DeepSeek-R1-Distill-Qwen-1.5B (Ollama)	Local LLM-based semantic triple extraction
Reasoning/QA	Llama-3.3-70B (Groq Cloud)	High-capacity reasoning for answer generation
Text Embeddings	BGE-small (BAAI/bge-small-en-v1.5)	Dense vector representations for FAISS indexing
Vector Search	FAISS (Facebook AI Similarity Search)	CPU-based approximate nearest neighbor search
Graph Database	Neo4j 5.x	Property graphs with Cypher query language
Document Store	MongoDB	Metadata, triples, audit logs, agent decisions
Query Interface	Safe NL→Cypher	Constrained LLM-based graph query synthesis

Data Stores & Persistence

Store	Purpose	Data
MongoDB	Operational Database	Raw documents, normalized text, candidate triples, validated triples, agent state, audit logs
Neo4j	Knowledge Graph	Entity nodes, relation edges, provenance metadata (source, confidence, timestamp), version history
FAISS	Vector Index	Dense embeddings for text chunks (CPU-efficient similarity search)
GridFS	Document Storage	Large documents (PDFs, HTML) stored as binary blobs with metadata

📁 Project Structure

glow/
├── services/                    # Core service modules
│   ├── ingestion/              # Multi-format document ingestion
│   ├── normalization/          # Text extraction & chunking
│   ├── extraction/             # Dual extraction (deterministic + LLM)
│   │   ├── table_extractor.py  # Structured data parsing
│   │   └── llm_extractor.py    # Semantic triple extraction
│   ├── embedding/              # BGE embeddings + FAISS indexing
│   ├── entity_resolution/      # Entity linking & deduplication
│   ├── validation/             # Bootstrap validation (external KBs)
│   ├── fusion/                 # Neo4j graph construction
│   ├── query/                  # Hybrid retrieval + NL→Cypher
│   │   ├── retrieval.py        # FAISS + Graph retrieval
│   │   ├── nl2cypher.py        # Safe Cypher generation
│   │   └── verification.py     # GraphVerify claim validation
│   └── agents/                 # Autonomous maintenance agents
│       ├── reverify_agent.py   # External KB validation
│       ├── conflict_resolver.py # Contradiction handling
│       └── schema_suggestor.py # Schema extension proposals
│
├── shared/                      # Shared utilities
│   ├── config/                 # Configuration management
│   ├── database/               # MongoDB & Neo4j connectors
│   ├── models/                 # Pydantic schemas
│   ├── prompts/                # LLM prompt templates
│   └── utils/                  # Utilities
│
├── workers/                     # Async task workers (Celery)
├── tests/                       # Unit & integration tests
├── metrics/                     # Evaluation results & visualizations
└── evaluation_results/          # Full dataset evaluation outputs
│   ├── validation/         # Fact validation engine
│   ├── fusion/             # Neo4j graph fusion
│   ├── retrieval/          # Hybrid retrieval
│   ├── query/              # QA service with GraphVerify
│   └── agents/             # Self-healing agents
├── shared/
│   ├── config/             # Configuration management
│   ├── database/           # DB connectors
│   ├── models/             # Pydantic schemas
│   ├── prompts/            # LLM prompt templates
│   └── utils/              # Shared utilities
├── workers/                # Celery task workers
├── api/                    # FastAPI endpoints
├── tests/                  # Unit & integration tests
├── docker/                 # Docker configs
└── deployment/             # K8s/compose configs

🚀 Quick Start

1. Install Services

macOS:

brew install mongodb-community neo4j redis ollama tesseract poppler

Linux:

# See SETUP.md for detailed Linux installation

2. Start Services

# macOS
brew services start mongodb-community
brew services start neo4j
brew services start redis
ollama serve &

# Pull Ollama model (for extraction only)
ollama pull deepseek-r1:1.5b

# Get Groq API key for Q&A (free tier available)
# Visit: https://console.groq.com/keys

3. Setup Project

# Clone and setup
git clone <repository-url>
cd graphbuilder-rag
chmod +x setup.sh
./setup.sh

4. Run Application

Option A: Separate terminals

# Terminal 1: API
python -m api.main

# Terminal 2: Worker
celery -A workers.tasks worker --loglevel=info --concurrency=4

# Terminal 3: Beat
celery -A workers.tasks beat --loglevel=info

# Terminal 4: Agents (optional)
python -m agents.agents

Option B: Tmux (all-in-one)

chmod +x run.sh
./run.sh

5. Test the API

Ingest a document:

curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "source": "https://en.wikipedia.org/wiki/Artificial_intelligence",
    "source_type": "HTML",
    "metadata": {"topic": "AI"}
  }'

Query the system: curl -X POST http://localhost:8000/api/v1/query
-H "Content-Type: application/json"
-d '{ "question": "What are the side effects of aspirin?", "max_chunks": 5, "graph_depth": 2 }'


## 🔧 Configuration

Edit `config/config.yaml`:

```yaml
mongodb:
  uri: mongodb://localhost:27017
  database: graphbuilder_rag

neo4j:
  uri: bolt://localhost:7687
  user: neo4j
  password: password

ollama:
  base_url: http://localhost:11434
  extraction_model: deepseek-r1:1.5b  # For entity/relationship extraction

groq:
  api_key: your-groq-api-key-here  # Get from https://console.groq.com/keys
  model: llama-3.3-70b-versatile  # For fast Q&A reasoning

faiss:
  index_type: IndexFlatIP
  embedding_dim: 384

agents:
  reverify_interval: 86400  # 24 hours
  conflict_check_interval: 3600  # 1 hour

📊 Monitoring

Access metrics at:

API Health: http://localhost:8000/health
Metrics: http://localhost:8000/metrics
Neo4j Browser: http://localhost:7474
MongoDB Compass: mongodb://localhost:27017

🧪 Testing

# Run all tests
pytest tests/

# Run specific service tests
pytest tests/services/extraction/

# Run integration tests
pytest tests/integration/

📖 Documentation

Setup & Installation

Setup Guide - Complete installation and configuration
Installation Checklist - Step-by-step setup verification
Quick Installation - Fast setup for all platforms

Architecture & Design

System Architecture - Complete system overview
Framework Guide - Customization and extension guide
Celery & Agents - Background tasks and autonomous agents

Usage & Testing

Quick Start - Get started in 5 minutes
Testing Guide - Test workflows and examples

Advanced Topics

External Verification - Third-party fact checking

🤝 Contributing

See CONTRIBUTING.md

📄 License

MIT License - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
agents		agents
api		api
documentation		documentation
helpers		helpers
services		services
shared		shared
tests		tests
ui		ui
workers		workers
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Samrudhp/GraphBuilder-RAG

Folders and files

Latest commit

History

Repository files navigation