LangSmith Evaluation Dashboard

A Streamlit-based dashboard for visualizing and analyzing evaluation data from LangSmith. This application provides comprehensive insights into AI model evaluation results, including quality metrics, ticket type analysis, and experiment tracking.

Features

📊 Interactive Dashboard: Beautiful, responsive Streamlit interface with real-time data visualization
📈 Key Metrics: Total evaluations, average scores, quality distribution, and more
📅 Time Series Analysis: Daily trends and patterns in evaluation quality
🔬 Experiment Tracking: Monitor and analyze different experiment types
🔍 Advanced Filtering: Filter by date range, ticket type, and quality
📋 Detailed Data Tables: Comprehensive data views with export capabilities
🔄 Live Data Sync: Fetch fresh data directly from LangSmith API

Screenshots

The dashboard includes:

Quality distribution pie charts
Ticket type distribution bar charts
Daily quality trends line charts
Experiment analysis tables
Interactive filters and date pickers

Prerequisites

Python 3.8 or higher
LangSmith API key
Access to LangSmith project "evaluators"

Installation

Clone or download this repository
```
cd streamlit-app-new
```
Install dependencies
```
pip install -r requirements.txt
```
Set up your LangSmith API key

Option A: Environment Variable
```
export LANGSMITH_API_KEY="your_api_key_here"
```
Option B: Streamlit Secrets (Recommended for production)

Create a .streamlit/secrets.toml file:
```
[langsmith]
api_key = "your_api_key_here"
```

Usage

Starting the Dashboard

Run the Streamlit app
```
streamlit run app.py
```
Open your browser and navigate to the URL shown in the terminal (usually http://localhost:8501)

Using the Dashboard

View Overview: The main page shows key metrics and summary charts
Apply Filters: Use the sidebar to filter by date range and ticket type
Explore Data: Click through different sections to analyze specific aspects
Refresh Data: Use the sidebar buttons to refresh cached data or fetch new data from LangSmith

Data Management

Refresh Cache: Clears cached data and reloads from the database
Fetch New Data: Retrieves fresh data from LangSmith API (requires API key)
Export: Data tables can be copied or exported for further analysis

Data Structure

The dashboard works with a SQLite database (merged_evaluation.db) containing:

Tables

evaluations
- id: Primary key
- date: Evaluation date
- ticket_id: Associated ticket identifier
- ticket_type: Type of ticket (homeowner, implementation, etc.)
- quality: Evaluation quality (good, bad, ugly)
- comment: Evaluation comments
- score: Numerical score (if available)
- experiment_name: Associated experiment name
- run_id: LangSmith run identifier
- start_time: Evaluation start time
- evaluation_key: Type of evaluation performed
latest_experiments
- id: Primary key
- date: Experiment date
- experiment_type: Category of experiment
- experiment_name: Full experiment name
- start_time: Experiment start time
- run_count: Number of runs in experiment
- updated_at: Last update timestamp

API Integration

The dashboard integrates with LangSmith API to:

Fetch evaluation runs from the "evaluators" project
Extract quality metrics and comments
Track experiment metadata
Handle rate limiting and timeouts gracefully

Rate Limiting

The application includes built-in rate limiting:

0.1 second delay between API calls
Automatic retry logic for failed requests
Configurable timeout settings

Customization

Adding New Metrics

Add new methods to EvaluationDatabase class in evaluation_database.py
Update the load_data() function in app.py
Add new visualizations to the dashboard

Styling

Custom CSS is included in the app for consistent styling. Modify the <style> section in app.py to customize colors, fonts, and layout.

Database Schema

To modify the database schema:

Update the init_database() method in EvaluationDatabase
Add new indexes for performance
Update data extraction methods accordingly

Troubleshooting

Common Issues

"No API Key Found"
- Ensure LANGSMITH_API_KEY environment variable is set
- Check .streamlit/secrets.toml file exists and is properly formatted
Database Connection Errors
- Verify merged_evaluation.db exists in the project directory
- Check file permissions
Missing Data
- Use "Fetch New Data" button to retrieve latest data from LangSmith
- Check API key permissions and project access
Performance Issues
- Data is cached by default; use "Refresh Data Cache" if needed
- Large datasets may take time to load initially

Debug Mode

Enable debug output by running:

streamlit run app.py --logger.level debug

Development

Project Structure

streamlit-app-new/
├── app.py                      # Main Streamlit application
├── evaluation_database.py      # Database and API integration
├── requirements.txt            # Python dependencies
├── README.md                   # This file
├── merged_evaluation.db        # SQLite database (auto-created)
└── .streamlit/                 # Streamlit configuration
    └── secrets.toml           # API keys (create manually)

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

This project is provided as-is for internal use. Please ensure compliance with your organization's data handling policies.

Support

For issues or questions:

Check the troubleshooting section above
Review LangSmith API documentation
Check Streamlit documentation for UI-related issues

Note: This dashboard is designed to work with the existing merged_evaluation.db database. If you need to create a new database or modify the schema, update the EvaluationDatabase class accordingly.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
evaluation_database.py		evaluation_database.py
fetch_missing_august_data.py		fetch_missing_august_data.py
merged_evaluation.db		merged_evaluation.db
merged_evaluation_backup_20250902_101954.db		merged_evaluation_backup_20250902_101954.db
refresh_data.py		refresh_data.py
requirements.txt		requirements.txt
start_dashboard.bat		start_dashboard.bat
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangSmith Evaluation Dashboard

Features

Screenshots

Prerequisites

Installation

Usage

Starting the Dashboard

Using the Dashboard

Data Management

Data Structure

Tables

API Integration

Rate Limiting

Customization

Adding New Metrics

Styling

Database Schema

Troubleshooting

Common Issues

Debug Mode

Development

Project Structure

Contributing

License

Support

About

Uh oh!

Releases

Packages

Languages

nikihu-vantaca/streamlit-app

Folders and files

Latest commit

History

Repository files navigation

LangSmith Evaluation Dashboard

Features

Screenshots

Prerequisites

Installation

Usage

Starting the Dashboard

Using the Dashboard

Data Management

Data Structure

Tables

API Integration

Rate Limiting

Customization

Adding New Metrics

Styling

Database Schema

Troubleshooting

Common Issues

Debug Mode

Development

Project Structure

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages