A multi-agent system for automated incident triage. The pipeline classifies incoming incidents, assesses impact, assigns responders, and drafts stakeholder communications.
This project demonstrates Domino's GenAI tracing and evaluation capabilities through a production-style pipeline applicable across financial services, healthcare, energy, and public sector.
Incidents flow through four specialized agents:
- Classifier Agent - Categorizes the incident and assigns urgency (1-5)
- Impact Assessor Agent - Evaluates blast radius, affected users, and financial exposure
- Resource Matcher Agent - Identifies responders based on skills and SLA requirements
- Response Drafter Agent - Generates communications for each stakeholder audience
Each agent uses dedicated tools to query historical data, check resource availability, and apply organizational policies.
- Unified Trace Tree - All agents, tools, and LLM calls appear in one hierarchical trace
- Automatic Instrumentation - Uses
@add_tracingdecorator withmlflow.openai.autolog()ormlflow.anthropic.autolog()frameworks - Aggregated Metrics -
DominoRuncaptures statistical summaries (mean, median) across traces
Three automated judges evaluate output quality:
- Classification Judge - Evaluates category and urgency appropriateness
- Response Judge - Assesses communication clarity and tone
- Triage Judge - Holistic assessment of the complete triage decision
- Users can score triage outputs across four dimensions (classification, impact, assignment, response)
- Feedback is logged to production traces via
log_evaluation() - Local backup stored in
/mnt/data/{project}/feedback/user_feedback.jsonl
Sample incidents for Financial Services, Healthcare, Energy, and Public Sector.
- Add your API key as a Domino user environment variable:
- Go to Account Settings > User Environment Variables
- Add
OPENAI_API_KEYorANTHROPIC_API_KEY
Open tracing-tutorial.ipynb for a step-by-step walkthrough:
- Select provider and vertical
- Execute the triage pipeline
- View traces in Domino Experiment Manager
Deploy TriageFlow as an agentic system with a Streamlit interface for interaction.
To deploy:
- Go to Deployments > Agents in Domino
- Configure the agent with
app/app.shas the launch script - Click Deploy
See Deploy Agentic Systems for detailed instructions.
Features:
- Select provider (OpenAI/Anthropic) and industry vertical
- Choose from sample tickets or enter custom incidents
- View agent responses with reasoning
- See LLM judge scores and rationales
- Submit human evaluations (logged to traces)
The run_scheduled_evaluation.py script analyzes traces from a deployed agent and generates an HTML report.
Setup: Edit the top of the script with your deployed agent ID:
AGENT_ID = "<REPLACE_WITH_YOUR_AGENT_ID>"Run daily analysis:
python run_scheduled_evaluation.py
# Analyzes traces from the last 24 hours
# Saves report to /mnt/artifacts/daily_report_<timestamp>.htmlBatch processing with tracing:
python run_scheduled_evaluation.py batch --vertical financial_services -n 10Add evaluations to existing traces:
python run_scheduled_evaluation.py evaluate --run-id <mlflow_run_id>TriageFlow/
├── tracing-tutorial.ipynb # Tutorial notebook
├── run_triage.py # Standalone triage script with tracing
├── run_scheduled_evaluation.py # Scheduled evaluation (analyzes last 24h)
├── config.yaml # Agent prompts, tools, model configs
├── src/
│ ├── models.py # Pydantic data models
│ ├── agents.py # Four triage agents
│ ├── tools.py # Agent tool functions
│ └── judges.py # LLM judge evaluators
├── app/
│ ├── app.sh # Agent launch script
│ ├── main.py # Streamlit interface
│ └── utils/ # Config management utilities
└── example-data/
├── financial_services.csv
├── healthcare.csv
├── energy.csv
└── public_sector.csv
from domino.agents.tracing import add_tracing, init_tracing
from domino.agents.logging import DominoRun, log_evaluation
import mlflow
# Initialize tracing
init_tracing()
# Enable framework autologging
mlflow.openai.autolog()@add_tracing(name="triage_incident", autolog_frameworks=["openai"], evaluator=my_evaluator)
def triage_incident(incident):
# Agent calls are automatically traced
classification = classify_incident(...)
impact = assess_impact(...)
# ...
return resultwith DominoRun(agent_config_path="config.yaml") as run:
result = triage_incident(incident)
# Traces are automatically captured and storedfrom domino.agents.tracing import search_traces
# Retrieve traces from a completed run
traces = search_traces(run_id=run_id)
# Add evaluations
for trace in traces.data:
log_evaluation(trace_id=trace.id, name="quality_score", value=4.5)