TraceLens

A Visual Debugger and Replay Engine for LangGraph Agentic Workflows

Project Overview

TraceLens addresses the "Silent Failure" crisis in agentic AI systems. Unlike traditional software where errors manifest as explicit exceptions, AI agents can fail silently through:

Tool Thrashing: Infinite loops of repetitive tool invocations without progress
Context Drift: Agent's internal world model diverging from actual system state
Non-Deterministic Failures: Bugs that only appear in production due to LLM token sampling

TraceLens provides a "Diagnostic Command Center" that offers:

Real-Time Visualization: Interactive graph showing agent execution flow
Time-Travel Navigation: Rewind to any checkpoint and inspect state
Active Intervention: Edit state and prompts, then resume execution from that point

Features

Telemetry & Visualization

Real-time Agent Monitoring: Watch your LangGraph agents execute in real-time
Interactive Graph Visualization: Beautiful React Flow graphs showing execution paths
OpenTelemetry Integration: Standardized telemetry collection and export
SQLite Persistence: Local storage with WAL mode for efficient checkpointing
Modern UI: Clean, minimalistic interface built with Next.js and Tailwind CSS
Easy Integration: Sidecar pattern - no modifications to your agent code needed

Time-Travel Navigation

Checkpoint Browser: Navigate through checkpoint history with ease
State Diff Viewer: Compare state between any two checkpoints
Timeline View: Chronological view of all events (checkpoints, spans, transitions)
Execution Replay: Step-by-step replay with play/pause/step controls

Active Intervention

State Editor: Edit checkpoint state with JSON editor and validation
Prompt Editor: Modify agent prompts and instructions
Resume Execution: Continue agent execution from modified checkpoints
Execution Branching: Create named branches for A/B testing and exploration
State Validation: Validate state edits with errors and warnings before saving

Security & Production Readiness

API Key Authentication: Optional auth for write endpoints
Rate Limiting: Configurable limits (read/write)
Configurable CORS: Restrict origins via environment
JSON-only State Input: No pickle from API (prevents RCE)
Audit Logging: State edits, resume, and branch operations
Centralized Error Handling: Sanitized responses, structured logging
Enhanced Health Checks: Database connectivity included

Architecture

TraceLens follows a "Sidecar" pattern where instrumentation wraps the agent without modifying core logic:

┌─────────────────┐
│  Agent Runtime  │  (LangGraph agent with tools)
└────────┬────────┘
         │
┌────────▼────────────────────────┐
│  Interceptor Layer             │
│  - OpenTelemetry Spans         │
│  - SQLite Checkpointer         │
└────────┬───────────────────────┘
         │
┌────────▼────────────────────────┐
│  Telemetry Server (FastAPI)    │
│  - REST API for trace data     │
│  - Graph transformation        │
└────────┬───────────────────────┘
         │
┌────────▼────────────────────────┐
│  Data Store (SQLite WAL)       │
│  - Checkpoints (state history) │
│  - Traces (OTel spans)          │
└────────────────────────────────┘
         │
┌────────▼────────────────────────┐
│  Diagnostic UI (Next.js)       │
│  - React Flow visualization    │
│  - Real-time updates           │
└────────────────────────────────┘

Prerequisites

Python 3.11+ (for async/await support and modern typing)
Node.js 20+ and npm/yarn
Google Gemini API key (get from Google AI Studio) for sample agent
Docker & Docker Compose (optional, for containerized deployment)

Installation

Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Frontend Setup

cd frontend
npm install
# or
yarn install

Configuration

Create a .env file in the project root:

# Required: Gemini API Key
GOOGLE_API_KEY=your_api_key_here
# or
GEMINI_API_KEY=your_api_key_here

# Optional: Database path (default: ./tracelens.db)
DATABASE_PATH=./tracelens.db

# Optional: OpenTelemetry exporter endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Optional: FastAPI server settings
FASTAPI_HOST=localhost
FASTAPI_PORT=8000

# Optional: LLM model selection
LLM_MODEL=gemini-1.5-pro  # or gemini-1.5-flash for faster responses

# Optional: Security
TRACELENS_REQUIRE_AUTH=false
TRACELENS_API_KEY=
TRACELENS_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
TRACELENS_RATE_LIMIT=100/minute
TRACELENS_RATE_LIMIT_WRITE=20/minute
TRACELENS_MAX_STATE_SIZE=10485760

# Frontend: Set when auth enabled (same as TRACELENS_API_KEY)
NEXT_PUBLIC_TRACELENS_API_KEY=
NEXT_PUBLIC_API_URL=http://localhost:8000

Quick Start

Start the backend server:

cd backend
uvicorn src.api.main:app --reload

Start the frontend:
```
cd frontend
npm run dev
```

Run the sample agent:

python backend/scripts/verify_telemetry.py

Access the UI: Open http://localhost:3000 in your browser

Usage Guide

Instrumenting Your Own LangGraph Agents

To use TraceLens with your own agents:

Import the SQLite checkpointer:

from src.storage.sqlite_checkpointer import SqliteCheckpointer

Initialize with your graph:

checkpointer = SqliteCheckpointer(db_path="./tracelens.db")
graph = graph.compile(checkpointer=checkpointer)

The instrumentation will automatically capture:
- Node transitions
- Tool invocations
- LLM calls
- State changes

API Documentation

Once the backend is running, access the interactive API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Key Endpoints

GET /api/runs - List all execution runs
GET /api/runs/{thread_id}/graph - Get graph structure with nodes and edges
GET /api/runs/{thread_id}/checkpoints - Get checkpoint history
GET /api/runs/{thread_id}/checkpoints/{checkpoint_id} - Get specific checkpoint state
GET /api/runs/{thread_id}/spans - Get OpenTelemetry spans for a run

Development

Project Structure

tracelens/
├── backend/
│   ├── src/
│   │   ├── agent/              # Sample LangGraph agent
│   │   ├── instrumentation/   # OTel hooks & checkpointer
│   │   ├── storage/            # SQLite persistence
│   │   └── api/                # FastAPI endpoints
│   ├── tests/                  # Unit tests & benchmarks
│   ├── benchmarks/             # Benchmark runner (run_all.py)
│   ├── scripts/                # Utility scripts
│   ├── requirements.txt
│   ├── requirements-dev.txt
│   └── main.py
├── frontend/                   # Next.js 15 app
│   ├── src/components/         # React components
│   ├── src/hooks/              # Custom React hooks
│   ├── pages/                  # Next.js pages
│   └── package.json
├── docker-compose.yml
├── CHANGELOG.md
├── README.md
└── .gitignore

Key Dependencies

Agent Orchestration: LangGraph for stateful, cyclic workflows
Observability: OpenTelemetry (OTel) for standardized telemetry
Backend: FastAPI for async, high-performance API server
Database: SQLite with WAL mode for local persistence
LLM Gateway: LiteLLM for multi-provider model access
Frontend: Next.js + React Flow for graph visualization
Styling: Tailwind CSS for modern UI

Testing & Benchmarks

cd backend
pip install -r requirements-dev.txt
pytest tests -k "not bench" -v          # unit tests only
pytest tests/bench_metrics.py -v --benchmark-only   # benchmarks
python -m benchmarks.run_all            # both

References

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

LangGraph for the agent orchestration framework
OpenTelemetry for standardized observability
React Flow for graph visualization
FastAPI for the high-performance API framework

Made while eating 🍕 for the AI agent development community

⭐ Star this repo if you find it helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TraceLens

Project Overview

Features

Telemetry & Visualization

Time-Travel Navigation

Active Intervention

Security & Production Readiness

Architecture

Prerequisites

Installation

Backend Setup

Frontend Setup

Configuration

Quick Start

Usage Guide

Instrumenting Your Own LangGraph Agents

API Documentation

Key Endpoints

Development

Project Structure

Key Dependencies

Testing & Benchmarks

References

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
tracelens_logo.png		tracelens_logo.png

License

certainly-param/tracelens

Folders and files

Latest commit

History

Repository files navigation

TraceLens

Project Overview

Features

Telemetry & Visualization

Time-Travel Navigation

Active Intervention

Security & Production Readiness

Architecture

Prerequisites

Installation

Backend Setup

Frontend Setup

Configuration

Quick Start

Usage Guide

Instrumenting Your Own LangGraph Agents

API Documentation

Key Endpoints

Development

Project Structure

Key Dependencies

Testing & Benchmarks

References

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages