Skip to content

Tracelens - Visual Debugger and Replay Engine for LangGraph Agentic Workflows - Real-time monitoring, time-travel debugging, and interactive graph visualization for AI agents

License

Notifications You must be signed in to change notification settings

certainly-param/tracelens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TraceLens Logo

TraceLens

A Visual Debugger and Replay Engine for LangGraph Agentic Workflows

GitHub stars GitHub watchers GitHub forks License: MIT Status: Under Development Python 3.11+ Node.js 20+ FastAPI Next.js

FeaturesQuick StartContributing

Project Overview

TraceLens addresses the "Silent Failure" crisis in agentic AI systems. Unlike traditional software where errors manifest as explicit exceptions, AI agents can fail silently through:

  • Tool Thrashing: Infinite loops of repetitive tool invocations without progress
  • Context Drift: Agent's internal world model diverging from actual system state
  • Non-Deterministic Failures: Bugs that only appear in production due to LLM token sampling

TraceLens provides a "Diagnostic Command Center" that offers:

  • Real-Time Visualization: Interactive graph showing agent execution flow
  • Time-Travel Navigation: Rewind to any checkpoint and inspect state
  • Active Intervention: Edit state and prompts, then resume execution from that point

Features

Telemetry & Visualization

  • Real-time Agent Monitoring: Watch your LangGraph agents execute in real-time
  • Interactive Graph Visualization: Beautiful React Flow graphs showing execution paths
  • OpenTelemetry Integration: Standardized telemetry collection and export
  • SQLite Persistence: Local storage with WAL mode for efficient checkpointing
  • Modern UI: Clean, minimalistic interface built with Next.js and Tailwind CSS
  • Easy Integration: Sidecar pattern - no modifications to your agent code needed

Time-Travel Navigation

  • Checkpoint Browser: Navigate through checkpoint history with ease
  • State Diff Viewer: Compare state between any two checkpoints
  • Timeline View: Chronological view of all events (checkpoints, spans, transitions)
  • Execution Replay: Step-by-step replay with play/pause/step controls

Active Intervention

  • State Editor: Edit checkpoint state with JSON editor and validation
  • Prompt Editor: Modify agent prompts and instructions
  • Resume Execution: Continue agent execution from modified checkpoints
  • Execution Branching: Create named branches for A/B testing and exploration
  • State Validation: Validate state edits with errors and warnings before saving

Security & Production Readiness

  • API Key Authentication: Optional auth for write endpoints
  • Rate Limiting: Configurable limits (read/write)
  • Configurable CORS: Restrict origins via environment
  • JSON-only State Input: No pickle from API (prevents RCE)
  • Audit Logging: State edits, resume, and branch operations
  • Centralized Error Handling: Sanitized responses, structured logging
  • Enhanced Health Checks: Database connectivity included

Architecture

TraceLens follows a "Sidecar" pattern where instrumentation wraps the agent without modifying core logic:

┌─────────────────┐
│  Agent Runtime  │  (LangGraph agent with tools)
└────────┬────────┘
         │
┌────────▼────────────────────────┐
│  Interceptor Layer             │
│  - OpenTelemetry Spans         │
│  - SQLite Checkpointer         │
└────────┬───────────────────────┘
         │
┌────────▼────────────────────────┐
│  Telemetry Server (FastAPI)    │
│  - REST API for trace data     │
│  - Graph transformation        │
└────────┬───────────────────────┘
         │
┌────────▼────────────────────────┐
│  Data Store (SQLite WAL)       │
│  - Checkpoints (state history) │
│  - Traces (OTel spans)          │
└────────────────────────────────┘
         │
┌────────▼────────────────────────┐
│  Diagnostic UI (Next.js)       │
│  - React Flow visualization    │
│  - Real-time updates           │
└────────────────────────────────┘

Prerequisites

  • Python 3.11+ (for async/await support and modern typing)
  • Node.js 20+ and npm/yarn
  • Google Gemini API key (get from Google AI Studio) for sample agent
  • Docker & Docker Compose (optional, for containerized deployment)

Installation

Backend Setup

cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Frontend Setup

cd frontend
npm install
# or
yarn install

Configuration

Create a .env file in the project root:

# Required: Gemini API Key
GOOGLE_API_KEY=your_api_key_here
# or
GEMINI_API_KEY=your_api_key_here

# Optional: Database path (default: ./tracelens.db)
DATABASE_PATH=./tracelens.db

# Optional: OpenTelemetry exporter endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Optional: FastAPI server settings
FASTAPI_HOST=localhost
FASTAPI_PORT=8000

# Optional: LLM model selection
LLM_MODEL=gemini-1.5-pro  # or gemini-1.5-flash for faster responses

# Optional: Security
TRACELENS_REQUIRE_AUTH=false
TRACELENS_API_KEY=
TRACELENS_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
TRACELENS_RATE_LIMIT=100/minute
TRACELENS_RATE_LIMIT_WRITE=20/minute
TRACELENS_MAX_STATE_SIZE=10485760

# Frontend: Set when auth enabled (same as TRACELENS_API_KEY)
NEXT_PUBLIC_TRACELENS_API_KEY=
NEXT_PUBLIC_API_URL=http://localhost:8000

Quick Start

  1. Start the backend server:

    cd backend
    uvicorn src.api.main:app --reload
  2. Start the frontend:

    cd frontend
    npm run dev
  3. Run the sample agent:

    python backend/scripts/verify_telemetry.py
  4. Access the UI: Open http://localhost:3000 in your browser

Usage Guide

Instrumenting Your Own LangGraph Agents

To use TraceLens with your own agents:

  1. Import the SQLite checkpointer:

    from src.storage.sqlite_checkpointer import SqliteCheckpointer
  2. Initialize with your graph:

    checkpointer = SqliteCheckpointer(db_path="./tracelens.db")
    graph = graph.compile(checkpointer=checkpointer)
  3. The instrumentation will automatically capture:

    • Node transitions
    • Tool invocations
    • LLM calls
    • State changes

API Documentation

Once the backend is running, access the interactive API documentation at:

Key Endpoints

  • GET /api/runs - List all execution runs
  • GET /api/runs/{thread_id}/graph - Get graph structure with nodes and edges
  • GET /api/runs/{thread_id}/checkpoints - Get checkpoint history
  • GET /api/runs/{thread_id}/checkpoints/{checkpoint_id} - Get specific checkpoint state
  • GET /api/runs/{thread_id}/spans - Get OpenTelemetry spans for a run

Development

Project Structure

tracelens/
├── backend/
│   ├── src/
│   │   ├── agent/              # Sample LangGraph agent
│   │   ├── instrumentation/   # OTel hooks & checkpointer
│   │   ├── storage/            # SQLite persistence
│   │   └── api/                # FastAPI endpoints
│   ├── tests/                  # Unit tests & benchmarks
│   ├── benchmarks/             # Benchmark runner (run_all.py)
│   ├── scripts/                # Utility scripts
│   ├── requirements.txt
│   ├── requirements-dev.txt
│   └── main.py
├── frontend/                   # Next.js 15 app
│   ├── src/components/         # React components
│   ├── src/hooks/              # Custom React hooks
│   ├── pages/                  # Next.js pages
│   └── package.json
├── docker-compose.yml
├── CHANGELOG.md
├── README.md
└── .gitignore

Key Dependencies

  • Agent Orchestration: LangGraph for stateful, cyclic workflows
  • Observability: OpenTelemetry (OTel) for standardized telemetry
  • Backend: FastAPI for async, high-performance API server
  • Database: SQLite with WAL mode for local persistence
  • LLM Gateway: LiteLLM for multi-provider model access
  • Frontend: Next.js + React Flow for graph visualization
  • Styling: Tailwind CSS for modern UI

Testing & Benchmarks

cd backend
pip install -r requirements-dev.txt
pytest tests -k "not bench" -v          # unit tests only
pytest tests/bench_metrics.py -v --benchmark-only   # benchmarks
python -m benchmarks.run_all            # both

References

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


Made while eating 🍕 for the AI agent development community

⭐ Star this repo if you find it helpful!

About

Tracelens - Visual Debugger and Replay Engine for LangGraph Agentic Workflows - Real-time monitoring, time-travel debugging, and interactive graph visualization for AI agents

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published