A Streamlit-based dashboard for visualizing and analyzing evaluation data from LangSmith. This application provides comprehensive insights into AI model evaluation results, including quality metrics, ticket type analysis, and experiment tracking.
- 📊 Interactive Dashboard: Beautiful, responsive Streamlit interface with real-time data visualization
- 📈 Key Metrics: Total evaluations, average scores, quality distribution, and more
- 📅 Time Series Analysis: Daily trends and patterns in evaluation quality
- 🔬 Experiment Tracking: Monitor and analyze different experiment types
- 🔍 Advanced Filtering: Filter by date range, ticket type, and quality
- 📋 Detailed Data Tables: Comprehensive data views with export capabilities
- 🔄 Live Data Sync: Fetch fresh data directly from LangSmith API
The dashboard includes:
- Quality distribution pie charts
- Ticket type distribution bar charts
- Daily quality trends line charts
- Experiment analysis tables
- Interactive filters and date pickers
- Python 3.8 or higher
- LangSmith API key
- Access to LangSmith project "evaluators"
-
Clone or download this repository
cd streamlit-app-new -
Install dependencies
pip install -r requirements.txt
-
Set up your LangSmith API key
Option A: Environment Variable
export LANGSMITH_API_KEY="your_api_key_here"
Option B: Streamlit Secrets (Recommended for production)
Create a
.streamlit/secrets.tomlfile:[langsmith] api_key = "your_api_key_here"
-
Run the Streamlit app
streamlit run app.py
-
Open your browser and navigate to the URL shown in the terminal (usually
http://localhost:8501)
- View Overview: The main page shows key metrics and summary charts
- Apply Filters: Use the sidebar to filter by date range and ticket type
- Explore Data: Click through different sections to analyze specific aspects
- Refresh Data: Use the sidebar buttons to refresh cached data or fetch new data from LangSmith
- Refresh Cache: Clears cached data and reloads from the database
- Fetch New Data: Retrieves fresh data from LangSmith API (requires API key)
- Export: Data tables can be copied or exported for further analysis
The dashboard works with a SQLite database (merged_evaluation.db) containing:
-
evaluations
id: Primary keydate: Evaluation dateticket_id: Associated ticket identifierticket_type: Type of ticket (homeowner, implementation, etc.)quality: Evaluation quality (good, bad, ugly)comment: Evaluation commentsscore: Numerical score (if available)experiment_name: Associated experiment namerun_id: LangSmith run identifierstart_time: Evaluation start timeevaluation_key: Type of evaluation performed
-
latest_experiments
id: Primary keydate: Experiment dateexperiment_type: Category of experimentexperiment_name: Full experiment namestart_time: Experiment start timerun_count: Number of runs in experimentupdated_at: Last update timestamp
The dashboard integrates with LangSmith API to:
- Fetch evaluation runs from the "evaluators" project
- Extract quality metrics and comments
- Track experiment metadata
- Handle rate limiting and timeouts gracefully
The application includes built-in rate limiting:
- 0.1 second delay between API calls
- Automatic retry logic for failed requests
- Configurable timeout settings
- Add new methods to
EvaluationDatabaseclass inevaluation_database.py - Update the
load_data()function inapp.py - Add new visualizations to the dashboard
Custom CSS is included in the app for consistent styling. Modify the <style> section in app.py to customize colors, fonts, and layout.
To modify the database schema:
- Update the
init_database()method inEvaluationDatabase - Add new indexes for performance
- Update data extraction methods accordingly
-
"No API Key Found"
- Ensure
LANGSMITH_API_KEYenvironment variable is set - Check
.streamlit/secrets.tomlfile exists and is properly formatted
- Ensure
-
Database Connection Errors
- Verify
merged_evaluation.dbexists in the project directory - Check file permissions
- Verify
-
Missing Data
- Use "Fetch New Data" button to retrieve latest data from LangSmith
- Check API key permissions and project access
-
Performance Issues
- Data is cached by default; use "Refresh Data Cache" if needed
- Large datasets may take time to load initially
Enable debug output by running:
streamlit run app.py --logger.level debugstreamlit-app-new/
├── app.py # Main Streamlit application
├── evaluation_database.py # Database and API integration
├── requirements.txt # Python dependencies
├── README.md # This file
├── merged_evaluation.db # SQLite database (auto-created)
└── .streamlit/ # Streamlit configuration
└── secrets.toml # API keys (create manually)
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is provided as-is for internal use. Please ensure compliance with your organization's data handling policies.
For issues or questions:
- Check the troubleshooting section above
- Review LangSmith API documentation
- Check Streamlit documentation for UI-related issues
Note: This dashboard is designed to work with the existing merged_evaluation.db database. If you need to create a new database or modify the schema, update the EvaluationDatabase class accordingly.