An intelligent multi-agent system combining RAG, SQL data analysis, and semantic memory for context-aware interactions with complete observability
Lumiere is an open-source, agentic RAG knowledge workspace that uses multi-agent reasoning, long- and short-term memory, Qdrant Cloud for vector storage, and complete observability via LangSmith.
Lumiere transforms traditional Q&A systems into an intelligent assistant that learns and adapts through semantic memory, supporting multiple interaction modes:
- ๐ RAG Mode: Document-grounded responses with semantic search + reranking
- ๐ Data Analyst Mode: SQL queries with automated visualizations
- ๐ฌ General Chat: Conversational AI with context awareness
- ๐ง Semantic Memory: Long-term learning from past interactions
- ๐ค User Isolation: Complete data separation per user
- Intent Node: Classifies queries, retrieves memories, and routes intelligently
- Retrieve Node: Vector search with CrossEncoder reranking
- Reason Node: Generates grounded RAG answers
- General Reason Node: Fallback for general knowledge
- SQL Execute Node: Generates and runs database queries
- SQL Reason Node: Interprets SQL results
- Visualize Node: Creates data visualizations (data_analyst mode)
- Critic Node: Validates answer quality before storage
- Memory Write Node: Stores conversations in semantic memory
- Long-term memory stored in Qdrant Cloud vector database
- Automatic learning from successful interactions
- Context-aware responses using past conversations
- Quality filtering via critic node (only ACCEPT decisions stored)
- Cross-session continuity for personalized experiences
- User-specific collections for complete data isolation
- Natural language to SQL query generation
- Automated chart creation (bar, line, pie, scatter, table)
- Interactive visualizations with Plotly
- Multi-table support with user-specific SQLite databases
- User isolation - each user has separate database file
- Hybrid chunking with semantic overlap
- Vector similarity search with OpenAI text-embedding-3-small
- CrossEncoder reranking (ms-marco-MiniLM-L-6-v2)
- Metadata filtering for precise retrieval
- Source attribution for transparency
- Pronoun resolution for conversational context
- User-specific document collections in Qdrant Cloud
- Automatic tracing for all LangChain/LangGraph operations
- Zero manual instrumentation required
- Full trace replay for debugging
- Performance metrics (latency, tokens, costs)
- Session tracking via user_id/session_id
- Error monitoring and alerting
- Token usage tracking per operation
- Separate Qdrant collections per user:
user_{user_id}_documents,user_{user_id}_memories - Separate SQLite databases per user:
lumiere_user_{user_id}.db - Session-based user IDs (UUID per session)
- Zero cross-user data leakage
- Multi-tenant architecture ready for production
โโโโโโโโโโโโโโโ
โ User โ
โ (Streamlit)โ
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LangGraph Workflow (9 Nodes) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ intent โ [retrieve|sql_execute| โ โ
โ โ general_reason] โ โ
โ โ โ โ โ โ โ
โ โ reason sql_reason general_reasonโ โ
โ โ โ โ โ โ โ
โ โ [visualize] โ critic โ memory_write โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโผโโโโโโ โโโโผโโโโโโโโโโโโโ
โ Qdrant Cloudโ โ SQLite (per โ
โ (per user) โ โ user) โ
โ - docs โ โ - tables โ
โ - memories โ โ - sessions โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ
โโโโโโโโผโโโโโโโโโโ
โ LangSmith โ
โ (Automatic โ
โ Tracing) โ
โโโโโโโโโโโโโโโโโโ
-
RAG Query Path
intent (needs_rag) โ retrieve โ reason โ critic โ memory_write โ END -
SQL/Data Analysis Path
intent (needs_sql) โ sql_execute โ sql_reason โ [visualize] โ critic โ memory_write โ END -
General Chat Path
intent โ general_reason โ critic โ memory_write โ END
See GRAPH_ARCHITECTURE.md for detailed workflow documentation or view lumiere_graph.png for visual representation.
- Python 3.11+
- Qdrant (running locally or cloud)
- OpenAI API key
- Langfuse account (optional, for observability)
-
Clone the repository
git clone https://github.com/kikomatchi/lumiere.git cd lumiere -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in the project root:# OpenAI API OPENAI_API_KEY=your_openai_api_key_here # Qdrant Configuration (Cloud or Local) QDRANT_URL=https://your-cluster.qdrant.io # Or http://localhost:6333 QDRANT_API_KEY=your_qdrant_api_key # Required for Qdrant Cloud # LangSmith Observability (Optional) LANGCHAIN_TRACING_V2=true LANGCHAIN_API_KEY=your_langsmith_api_key LANGCHAIN_PROJECT=Lumiere LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
-
Start Qdrant (if running locally, skip if using Qdrant Cloud)
docker run -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage:z \ qdrant/qdrant -
User collections auto-created
- No manual initialization needed!
- Collections created automatically on first upload/query per user
- Format:
user_{user_id}_documents,user_{user_id}_memories
-
Launch Lumiere python scripts/init_semantic_memory.py
-
Run the application
streamlit run app.py
-
Open your browser
Navigate to
http://localhost:8501
Via Streamlit UI:
- Click "๐ Document Ingestion" in sidebar
- Upload PDF, TXT, or MD files
- Click "Ingest Documents"
- Wait for confirmation
Via Script:
python -c "from rag.ingest import ingest_directory; ingest_directory('path/to/docs')""What is FFXIV?"
"Explain vector databases"
"How does semantic search work?"
"Show me the top 5 products by sales"
"How many hybrid cars are in the database?"
"What is the average price by manufacturer?"
"Hello, how are you?"
"Can you help me with my project?"
"What can you do?"
In Streamlit:
- Expand "๐ง Semantic Memory" in sidebar
- View total memories and types
- Search memories by keyword
- See relevance scores and timestamps
Via Python:
from memory.semantic_memory import get_memory_stats, retrieve_memories
# Get statistics
stats = get_memory_stats()
print(stats)
# Search memories
memories = retrieve_memories(
query="database queries",
top_k=5,
user_id="user123",
min_score=0.7
)Use the sidebar to select:
- All In: All features enabled (default)
- Chat + RAG: Document Q&A only
- Data Analyst: SQL queries + visualizations
Lumiere/
โโโ agents/ # Agent implementations
โ โโโ intent_agent.py # Intent classification + memory retrieval
โ โโโ reasoning_agent.py # RAG reasoning
โ โโโ sql_agent.py # SQL generation & execution
โ โโโ critic_agent.py # Quality validation
โ โโโ viz_agent.py # Visualization generation
โ
โโโ graph/ # LangGraph workflow
โ โโโ rag_graph.py # Main graph definition
โ โโโ state.py # State management
โ
โโโ memory/ # Semantic memory system
โ โโโ semantic_memory.py # Vector-based memory storage/retrieval
โ
โโโ rag/ # RAG components
โ โโโ chunking.py # Document chunking strategies
โ โโโ collections.py # Qdrant collection management
โ โโโ embeddings.py # OpenAI embeddings wrapper
โ โโโ ingest.py # Document ingestion pipeline
โ โโโ qdrant_client.py # Qdrant client singleton
โ โโโ retriever.py # Semantic search & filtering
โ
โโโ database/ # Data storage
โ โโโ sqlite_client.py # SQLite connection & queries
โ
โโโ config/ # Configuration
โ โโโ settings.py # Environment & settings
โ
โโโ scripts/ # Utility scripts
โ โโโ init_semantic_memory.py # Initialize memory system
โ โโโ ingest_test.py # Test document ingestion
โ โโโ retrieval_test.py # Test retrieval
โ
โโโ ui/ # Streamlit components
โ โโโ (UI modules)
โ
โโโ app.py # Main Streamlit application
โโโ requirements.txt # Python dependencies
โโโ graph_visualization.mmd # Mermaid diagram
โโโ graph_visualization.png # Architecture diagram
โโโ GRAPH_ARCHITECTURE.md # Detailed architecture docs
โโโ SEMANTIC_MEMORY.md # Memory system documentation
โโโ README.md # This file
-
Storage: Every accepted conversation is embedded and stored in Qdrant
- Uses OpenAI
text-embedding-3-small(1536 dimensions) - Includes query, response, mode, and metadata
- Quality-filtered by critic agent (only ACCEPT decisions)
- Uses OpenAI
-
Retrieval: Intent agent retrieves relevant memories before processing
- Top-k semantic search with cosine similarity
- Configurable threshold (default: 0.75)
- Formatted context injected into agent prompts
-
Benefits:
- Personalization: Remembers user preferences
- Context: Understands conversation history
- Learning: Improves responses over time
- Continuity: Works across sessions
conversation: General Q&A interactionspreference: User preferences (e.g., "I prefer bar charts")fact: User-declared facts (e.g., "I'm working on X project")pattern: Common query patternserror_resolution: Problem-solving history
First interaction:
User: "Show me sales data as a bar chart"
Assistant: [Generates bar chart]
๐พ Stores: User prefers bar charts for sales data
Later interaction:
User: "Show me revenue trends"
Assistant: [Retrieves memory about chart preference]
[Automatically generates bar chart]
See SEMANTIC_MEMORY.md for detailed documentation.
- Natural language to SQL: Generate queries from plain English
- Automated visualizations: Smart chart type selection
- Interactive charts: Plotly-based visualizations
- Result interpretation: Natural language summaries
- Bar Chart: Comparisons, rankings
- Line Chart: Trends over time
- Pie Chart: Proportions, distributions
- Scatter Plot: Correlations, relationships
"Show me sales by region"
โ SQL: SELECT region, SUM(sales) FROM sales GROUP BY region
โ Chart: Bar chart with regions on x-axis
"How have prices changed over time?"
โ SQL: SELECT date, AVG(price) FROM products GROUP BY date
โ Chart: Line chart showing price trends
"What's the distribution of car types?"
โ SQL: SELECT type, COUNT(*) FROM cars GROUP BY type
โ Chart: Pie chart showing proportions
- Semantic chunking: Split by meaning, not just length
- Overlap: Maintains context between chunks
- Metadata preservation: Source, page numbers, timestamps
- Hybrid search: Combines semantic + keyword search
- Metadata filtering: Filter by source, date, type
- Reranking: Re-scores results for relevance
- Source attribution: Shows where answers come from
- PDF: Automatic text extraction
- TXT: Plain text files
- Markdown: Preserves formatting
- Batch ingestion: Process entire directories
# Model Configuration
OPENAI_MODEL = "gpt-4o-mini"
EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSIONS = 1536
# Retrieval Settings
TOP_K_RETRIEVAL = 3
MIN_SIMILARITY_SCORE = 0.7
# Memory Settings
MEMORY_TOP_K = 3
MEMORY_MIN_SCORE = 0.75
# Chunking
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200See .env.example for all available configuration options.
1. Qdrant Connection Error
Error: Cannot connect to Qdrant
Solution: Ensure Qdrant is running on localhost:6333
docker ps | grep qdrant # Check if running2. OpenAI API Error
Error: Invalid API key
Solution: Check .env file has correct OPENAI_API_KEY
3. No Memories Stored
Memory count remains at 3
Solution:
- Check critic is accepting answers (look for โ in terminal)
- Ensure Qdrant collection exists
- Verify semantic memory is enabled
4. Import Errors
ModuleNotFoundError: No module named 'X'
Solution: Reinstall dependencies
pip install -r requirements.txtEnable detailed logging:
# In config/settings.py
DEBUG_MODE = TrueLook for these debug indicators in terminal:
- ๐พ Memory Write Node
- โ Stored semantic memory
- โญ๏ธ Skipping memory storage
- ๐ฆ Retrieval node
- ๐ Query analysis
Lumiere integrates with Langfuse for comprehensive observability:
- Traces: Full request lifecycle tracking
- Token usage: Cost monitoring per operation
- Latency: Performance metrics
- Agent behavior: Decision tracking
Setup:
- Create account at langfuse.com
- Add keys to
.env - View traces in Langfuse dashboard
View memory stats in terminal:
python -c "from memory.semantic_memory import get_memory_stats; import json; print(json.dumps(get_memory_stats(), indent=2))"Example output:
{
"total_memories": 15,
"vector_size": 1536,
"memory_types": {
"conversation": 10,
"preference": 3,
"fact": 1,
"pattern": 1
}
}We welcome contributions! Please see our contributing guidelines.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Follow PEP 8
- Use type hints
- Add docstrings to functions
- Keep functions focused and small
Full documentation is available in the docs/ folder:
- Quick Start Guide: Get up and running in 5 minutes
- Architecture Guide: Detailed workflow documentation
- Semantic Memory Guide: Memory system documentation
- Contributing Guide: How to contribute
- Changelog: Version history and updates
- Documentation Index: Complete documentation overview
Comprehensive test suite with 34 tests covering core functionality:
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific test file
pytest tests/test_semantic_memory.pyTest Coverage:
- โ Semantic Memory (9 tests)
- โ Intent Agent (6 tests)
- โ Graph Workflow (10 tests)
- โ RAG Components (10 tests)
See tests/README.md for complete testing guide and TEST_SETUP_SUMMARY.md for current status.
- Multi-agent RAG system
- Semantic memory integration
- SQL data analysis
- Automated visualizations
- Critic-based quality control
- Langfuse observability
- Multi-user support with user isolation
- Memory pruning and consolidation
- Advanced query routing
- Custom embedding models
- API endpoints (REST/GraphQL)
- Memory analytics dashboard
- Feedback loop for memory refinement
- Multi-modal support (images, audio)
- Agent collaboration framework
- Distributed memory architecture
- Real-time streaming responses
- Plugin system for extensibility
This project is licensed under the MIT License - see the LICENSE file for details.
Built with:
- LangChain - LLM framework
- LangGraph - Agent orchestration
- Qdrant - Vector database
- Streamlit - UI framework
- OpenAI - LLM & embeddings
- Langfuse - Observability
For questions, issues, or feedback:
- Open an issue on GitHub
- Check existing documentation
- Review troubleshooting section
If you find Lumiere useful, please consider giving it a star! โญ
Made with โค๏ธ for the AI community
