A comprehensive second-brain system that combines intelligent multi-agent orchestration, privacy-first document archiving with PII sanitization, and seamless web research capabilities using PydanticAI, ChromaDB, and Brave Search integration.
- π€ Multi-Agent System: Orchestrator coordinates between Archivist (local memory) and Researcher (web search)
- π PII Sanitization: Automatic detection and redaction of sensitive information using Presidio
- π Document Ingestion: Parse and index markdown files with header-based chunking
- π Intelligent Routing: Query routing based on content - personal notes vs. web search
- πΎ Vector Database: ChromaDB for efficient similarity search
- π Web Search Integration: Brave Search via MCP server for external queries
- π‘οΈ Privacy-First: All PII is removed before storage
- π Observability: Built-in telemetry with Logfire/OpenTelemetry
- Python 3.13+
- uv package manager
- Node.js (for Brave Search MCP server)
- Anthropic API key for Claude
- Brave Search API key (for web search functionality)
# Install dependencies and download spaCy model
make installOr manually:
uv sync
uv run spacy download en_core_web_lgRun the orchestrator (recommended - routes to both agents):
make run-orchestratorOr run individual agents:
make run-archivist # Local memory search only
make run-researcher # Web search onlymake help # Show all available commands
make run-orchestrator # Run the orchestrator agent (routes queries)
make run-archivist # Run the archivist agent (memory only)
make run-researcher # Run the researcher agent (web search only)
make evals # Run evaluation tests for all agents
make install # Install dependencies
make clean # Clean up generated files
make clean-chroma # Clean ChromaDB persistent data
make format # Format code with black and isort
make lint # Lint code with ruff
make fix-lint # Auto-fix linting issues
make check # Run all checks (lint + type check)from memory import MemoryTool
# Initialize memory tool
memory = MemoryTool()
# Create sample documents
memory.create_sample_docs()
# Ingest documents with PII sanitization
memory.ingest_folder()
# Search for information
results = memory.collection.query(
query_texts=["deployment guide"],
n_results=1
)graph TB
subgraph "Second Brain System"
subgraph "Agent Layer"
ORC[Orchestrator Agent<br/>Decision Router]
ARC[Archivist Agent<br/>Memory Manager]
RES[Researcher Agent<br/>Web Search]
end
subgraph "Core Components"
MEM[Memory Tool<br/>Vector DB Interface]
GRD[PII Guardrail<br/>Presidio]
TEL[Telemetry<br/>Logfire/OTEL]
end
subgraph "Data Layer"
CHR[(ChromaDB<br/>Vector Store)]
DOC[(.secondbrain/<br/>Markdown Docs)]
end
subgraph "External Services"
LLM[Claude API<br/>Anthropic]
BRV[Brave Search<br/>MCP Server]
end
end
USER[User Query] --> ORC
ORC -->|Route to Memory| ARC
ORC -->|Route to Web| RES
ARC -->|search/save| MEM
ARC -->|LLM calls| LLM
RES -->|web search| BRV
RES -->|LLM calls| LLM
MEM -->|sanitize| GRD
MEM -->|query/store| CHR
MEM -->|ingest| DOC
GRD -->|analyze PII| MEM
ORC -.->|trace| TEL
ARC -.->|trace| TEL
RES -.->|trace| TEL
style USER fill:#e1f5ff
style ORC fill:#fff4e6
style ARC fill:#fff4e6
style RES fill:#fff4e6
style MEM fill:#e8f5e9
style GRD fill:#e8f5e9
style TEL fill:#e8f5e9
style CHR fill:#f3e5f5
style DOC fill:#f3e5f5
style LLM fill:#fce4ec
style BRV fill:#fce4ec
Key Components:
- Orchestrator Agent: Routes queries to either Archivist (local memory) or Researcher (web search)
- Archivist Agent: Manages personal knowledge base with PII-sanitized storage and retrieval
- Researcher Agent: Searches the web using Brave Search MCP server
- Memory Tool: Handles markdown parsing, chunking, and vector database operations
- PII Guardrail: Uses Presidio to detect and redact sensitive information before storage
- Telemetry: Distributed tracing with Logfire/OpenTelemetry for observability
.
βββ agents/
β βββ orchestrator.py # Orchestrator agent (router)
β βββ archivist.py # Archivist agent (memory)
β βββ researcher.py # Researcher agent (web search)
β βββ *_evals.py # Agent evaluation tests
β βββ __init__.py
βββ memory.py # MemoryTool class with PII sanitization
βββ guardrails.py # PIIGuardrail using Presidio
βββ otel.py # Telemetry configuration
βββ Makefile # Build and run commands
βββ pyproject.toml # Project dependencies
βββ README.md
The system automatically detects and redacts:
- π§ Email addresses β
<EMAIL> - π Phone numbers β
<PHONE_NUM> - π€ Person names β
<PERSON> - π Other sensitive data β
<REDACTED>
Example:
Input: "Call John Doe at 555-0123 or email john.doe@example.com"
Output: "Call <PERSON> at <PHONE_NUM> or email <EMAIL>"
Create a .env file in the project root:
# Required
ANTHROPIC_API_KEY=your-anthropic-api-key-here
BRAVE_API_KEY=your-brave-api-key-here
# Optional (defaults are set in agents/__init__.py)
DEFAULT_LLM_MODEL=anthropic:claude-haiku-4-5
TOKENIZERS_PARALLELISM=true
# For Opik tracing (optional)
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:5173/api/v1/private/otelOr set them manually:
export ANTHROPIC_API_KEY="your-anthropic-api-key-here"
export BRAVE_API_KEY="your-brave-api-key-here"Run agent evaluations (tests all three agents):
make evalsThis will run:
agents/archivist_evals.py- Archivist agent testsagents/researcher_evals.py- Researcher agent testsagents/orchestrator_evals.py- Orchestrator agent tests
make format # Format code with black and isort
make lint # Check code with ruff
make fix-lint # Auto-fix linting issues
make check # Run all checks (lint + mypy)# Clean generated files only
make clean
# Clean ChromaDB data
make clean-chroma
# Clean everything
make clean && make clean-chroma- Pydantic AI: Agent framework
- Pydantic Evals: Evaluation framework
- Brave Search MCP: Brave search mcp server
- ChromaDB: Vector database for RAG and memory
- Presidio: PII detection and anonymization for Guardrails
- spaCy: NLP for entity recognition
- Claude (Anthropic): Language model
- uv: Fast Python package manager
- Opik: Experiment tracking and evaluation platform (optional)
This project supports Opik for experiment tracking and evaluation. You can run Opik locally using Docker Compose.
- Docker and Docker Compose installed
- Clone the Opik repository (outside this project):
# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git
# Navigate to the opik folder
cd opik
# Start the Opik platform
./opik.sh
# Stop the Opik platform
./opik.sh --stopOpik will be available at http://localhost:5173
Once Opik is running, your evaluation traces will automatically be logged to the local Opik instance at http://localhost:5173.
GPLv3
Contributions are welcome! Please feel free to submit a Pull Request.