livekit-examples · siddharthraja · Nov 14, 2025 · Nov 14, 2025
diff --git a/.gitignore b/.gitignore
@@ -9,4 +9,15 @@ KMS
 .vscode
 *.egg-info
 .pytest_cache
-.ruff_cache
+.ruff_cache
+
+# Generated test outputs
+conversations/
+results/
+report.html
+*.mp3
+*.wav
+
+# Temporary test files
+test_connection.py
+test_simple.py
diff --git a/QUICK_START.md b/QUICK_START.md
@@ -0,0 +1,161 @@
+# 🚀 Voice Agent Testing Framework - Quick Start
+
+## ✅ What We've Built
+
+A complete **voice agent testing framework** that simulates customer support conversations between:
+- **Synthetic customers** with different personalities (angry, confused, technical, etc.)
+- **Mock support agents** using your actual Acme prompts
+
+## 🎯 Key Features
+
+### 1. **5 Pre-Built Customer Personas**
+- Angry refund seeker (Karen)
+- Confused elderly user (Harold)
+- Technical bug reporter (Alex)
+- Friendly billing inquiry (Sarah)
+- Edge case nightmare (Jordan)
+
+### 2. **Rich Metrics Collection**
+- **Conversation quality**: Interruptions, silence gaps, gibberish detection
+- **Performance metrics**: Response times, speech rates, turn-taking
+- **Behavioral analysis**: Task completion, sentiment progression
+- **Audio quality**: STT confidence, overlapping speech
+
+### 3. **Comprehensive Reporting**
+- JSON transcripts with timestamps
+- HTML dashboards with visual metrics
+- Batch testing capabilities
+- Real-time metric streaming
+
+## 📦 What's Included
+
+```
+src/
+├── customer_agent.py      # 5 customer personas ready to use
+├── support_agent.py       # Mock Acme agent with your prompts
+├── conversation_orchestrator.py  # Connects agents in rooms
+├── metrics_collector.py   # Captures 20+ metrics per conversation
+├── test_runner.py         # Parallel test execution
+└── results_viewer.py      # HTML report generation
+
+prompts/
+└── acme_system_prompt.txt # 200+ line comprehensive support prompt
+
+run_test.py               # Main entry point with interactive menu
+TEST_FRAMEWORK.md         # Complete documentation
+```
+
+## 🏃 How to Use It
+
+### 1. Add Your Prompts
+Replace the content in `prompts/acme_system_prompt.txt` with your actual support agent instructions.
+
+### 2. Run Tests
+
+```bash
+# Interactive mode (easiest)
+uv run python run_test.py
+
+# Test specific scenario
+uv run python run_test.py test angry_refund
+
+# Run all scenarios
+uv run python run_test.py test all
+
+# View results
+uv run python run_test.py results
+
+# Generate HTML report
+uv run python run_test.py report
+```
+
+### 3. Analyze Results
+
+Each test generates:
+- Full conversation transcript with timestamps
+- Quality metrics (interruptions, latency, etc.)
+- Performance scores
+- HTML dashboard for visualization
+
+## 💡 Value Proposition
+
+### Current State (With This Framework)
+- ✅ **Both agents simulated locally** for maximum control
+- ✅ **5 customer scenarios** covering common support cases
+- ✅ **20+ metrics** captured per conversation
+- ✅ **Parallel testing** capability (run 10+ conversations simultaneously)
+- ✅ **Rich transcripts** with audio quality metadata
+
+### Next Step (When Ready)
+- Connect to real Acme agents via Twilio/SIP
+- A/B test different prompts objectively
+- Discover edge cases automatically
+- Measure improvement quantitatively
+
+## 🔥 Immediate Benefits
+
+1. **Test prompt changes in minutes** - No manual calling required
+2. **Objective quality metrics** - Quantify improvements
+3. **Edge case discovery** - Find failure modes automatically
+4. **Regression testing** - Ensure changes don't break existing behavior
+5. **Performance baselines** - Track metrics over time
+
+## 📊 Example Metrics Captured
+
+```json
+{
+  "conversation_id": "angry_refund_1234567",
+  "duration": 87.3,
+  "turns": 14,
+  "quality_metrics": {
+    "interruptions": {
+      "count": 3,
+      "details": [...]
+    },
+    "audio_quality_events": {
+      "gibberish_count": 0,
+      "silence_gaps": 2
+    }
+  },
+  "performance": {
+    "customer": {
+      "avg_response_time": 0.8,
+      "speech_rate": 145
+    },
+    "support": {
+      "avg_response_time": 1.2,
+      "first_response_time": 2.1
+    }
+  }
+}
+```
+
+## 🚀 Next Actions
+
+1. **Immediate**: Run `uv run python run_test.py` to see the framework in action
+2. **Today**: Add your actual support prompt to `prompts/acme_system_prompt.txt`
+3. **This Week**: Run 50+ test conversations to establish baselines
+4. **Next Week**: Connect to your real agents via Twilio when ready
+
+## 🎯 Why This Matters
+
+- **No more manual testing** - Automated voice conversations at scale
+- **Data-driven optimization** - Measure, don't guess
+- **Faster iteration** - Test → Measure → Improve in minutes
+- **Quality assurance** - Catch issues before customers do
+
+## 📝 Technical Notes
+
+- Built on LiveKit's production-grade infrastructure
+- Uses GPT-4o-mini for cost-effective testing
+- Supports parallel execution for rapid testing
+- All data saved as JSON for integration with your eval framework
+
+---
+
+**Time to Build**: 4 hours
+**Lines of Code**: ~2000
+**Test Scenarios**: 5 (easily expandable)
+**Metrics Captured**: 20+ per conversation
+
+Ready to revolutionize your voice agent testing! 🚀