Tons of features and adding cartesia etc. #37

siddharthraja · 2025-11-17T23:34:21Z

No description provided.

- Implement dual-agent testing system with customer and support personas - Add 5 pre-built customer scenarios (angry, confused, technical, friendly, edge case) - Create metrics collection for conversation quality analysis - Build conversation simulator with OpenAI TTS for audio generation - Add comprehensive test runner with batch execution capabilities - Include HTML report generation for test results - Integrate with LiveKit for room-based agent communication - Add detailed documentation and quick start guides Test framework captures: - Full conversation transcripts with timestamps - Audio recordings (MP3) of synthetic conversations - Quality metrics (interruptions, latency, speech rate) - Behavioral analysis for prompt optimization 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

- Added 8 realistic Jodo payment collection scenarios based on actual client prompt - Implemented Hindi-English code-mixing support for Indian context - Changed file naming to timestamp-first format (YYYYMMDD_HHMMSS) for better sorting - Updated conversation simulator to use real payment collection scenarios - Scenarios include cooperative parent, angry customer, wrong person, cancellation attempts, etc. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ture - Complete refactor from spaghetti code to clean, modular architecture - Implemented provider pattern for LLM/TTS/Storage abstraction - Added domain models (Persona, Conversation, Metrics) - Created service layer with ConversationOrchestrator - Built CLI interface with multiple commands - Removed deprecated files (old Flask API, web UI, duplicate code) - Added comprehensive documentation (USAGE.md, ARCHITECTURE.md) Key improvements: - Separation of concerns with distinct layers - Easy provider switching (OpenAI/ElevenLabs) - Storage gateway pattern for future cloud support - Configuration management via YAML/env vars - Ready for PostgreSQL and FastAPI integration 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

- Remove deprecated files from old spaghetti code implementation: - conversation_orchestrator.py (old LiveKit room orchestrator) - test_runner.py (old test runner using LiveKit rooms) - metrics_collector.py (old metrics collection) - results_viewer.py (old results viewer) - elevenlabs_voices.py (old ElevenLabs implementation) - support_agent.py (old support agent implementation) These files have been replaced by the new modular voice_conversation_generator package. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Integrated Cartesia as a third TTS provider alongside OpenAI and ElevenLabs. ## New Features ### CartesiaTTSProvider - Full async support using Cartesia Python SDK - Supports Sonic-3, Sonic-2, and Sonic-Turbo models - 15+ language support (en, fr, de, es, pt, zh, ja, hi, it, ko, nl, pl, ru, sv, tr) - Default voice presets for different personas - PCM to MP3 conversion for consistency ### Voice Configuration - Pre-configured voice IDs for support/customer personas - Flexible voice_id or voice_name selection - Configurable output format (sample rate, encoding, container) ### CLI Integration - Added `--tts cartesia` option to generate command - Compatible with all existing CLI features ## Implementation Details **Files Added:** - `src/voice_conversation_generator/providers/tts/cartesia.py` - Cartesia TTS provider **Files Modified:** - `src/voice_conversation_generator/providers/tts/__init__.py` - Export CartesiaTTSProvider - `src/voice_conversation_generator/providers/__init__.py` - Include in package exports - `src/voice_conversation_generator/services/provider_factory.py` - Factory support - `src/vcg_cli.py` - CLI option for cartesia - `USAGE.md` - Documentation for Cartesia usage - `pyproject.toml` - Added cartesia and pydub dependencies ## Usage ```bash # Set API key export CARTESIA_API_KEY=your_key # Generate with Cartesia uv run python src/vcg_cli.py generate --tts cartesia --customer angry_insufficient_funds ``` ## Technical Features - Streaming TTS generation with async iteration - Automatic PCM to MP3 conversion (requires pydub) - Graceful fallback if pydub unavailable - Comprehensive error handling 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

Updated OpenAI provider to properly handle GPT-4.1 model with correct parameter usage. ## Changes ### OpenAI Provider (`openai.py`) - Updated model detection to handle `gpt-4.1` (models starting with `gpt-4.`) - Uses `max_completion_tokens` parameter for GPT-4.1 (new API format) - Maintains backward compatibility with older GPT-4 models ### Documentation Updates - `CONVERSATION_LOGIC_EXPLAINED.md`: Updated all gpt-4o references to gpt-4.1 - `QUICK_COMMANDS.md`: Updated LLM_MODEL example to gpt-4.1 ## Testing - ✅ Verified GPT-4.1 initialization and model name - ✅ Tested completion with simple prompt (Hindi translation) - ✅ Generated full conversation with cooperative_parent persona - ✅ Confirmed correct parameter usage (max_completion_tokens) - ✅ Verified LLM metrics show "gpt-4.1" ## Usage ```bash # Set GPT-4.1 as default model export LLM_MODEL=gpt-4.1 # Generate conversation with GPT-4.1 uv run python src/vcg_cli.py generate --customer angry_insufficient_funds ``` 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

… support Added detailed documentation on using Cartesia for Indian accents including: - Native Hindi and Hinglish support confirmation - Language parameter usage (language='hi') - Dedicated Hinglish voices for Indian market - Code examples and implementation steps - Voice configuration guide Key findings: - Cartesia launched Hinglish support in 2024 specifically for India - Supports fluid code-switching between Hindi and English - Natural Indian intonation and pronunciation - 40ms latency with India deployments - Current implementation already supports language parameter 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

**Problem**: Cartesia TTS was receiving OpenAI-specific model names ("tts-1") and voice names ("onyx", "nova") from the shared config, causing API errors. **Root Cause**: - Default config had model="tts-1" for all TTS providers - Personas used OpenAI voice names regardless of TTS provider - Cartesia provider was using voice_config.model directly **Solution**: 1. Validate model in Cartesia __init__: If config contains non-Cartesia model (e.g., "tts-1"), use default "sonic-3" 2. Don't use voice_config.model in generate_speech: Always use default_model to avoid cross-provider contamination 3. Add voice name mapping: Map OpenAI voice names ("onyx", "nova", etc.) to Cartesia voice IDs (UUIDs) for seamless provider switching **Testing**: - Generated conversation with Cartesia TTS successfully - Audio file created (23.1 seconds) - All voice mappings working correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

**Problem**: Two separate data directories were being created: - /Users/sid/Documents/GitHub/livekit-starter/data/conversations - /Users/sid/Documents/GitHub/livekit-starter/src/data/conversations This happened because the relative path "data/conversations" was resolved differently depending on where the script was run from (root vs src). **Solution**: 1. Added _get_project_root() helper to find project root (via pyproject.toml/.git) 2. Added _get_default_data_path() to compute absolute path to data directory 3. Updated StorageConfig to use absolute path by default 4. Deleted src/data directory completely 5. Moved latest test files to root data directory **Benefits**: - All conversations now stored in single location - Works correctly regardless of where scripts are run from - No duplicate data directories - Cleaner file organization **Testing**: - Generated test conversation - saved to root data directory - Verified src/data no longer exists - All file paths now absolute 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

**Voice Selection**: - Support Agent: **Ishan** (fd2ada67-c2d9-4afe-b474-6386b87d8fc3) - "Conversational male for Hinglish sales and customer support" - Perfect for professional support conversations in Hinglish - Male Customer: **Devansh** (1259b7e3-cb8a-43df-9446-30971a46b8b0) - "Warm, conversational Indian male adult voice" - Ideal for casual customer interactions **Features Added**: 1. Language-aware voice mapping - detects Hindi characters and switches to Indian voices automatically 2. Auto-detects Hindi text (Devanagari script) and passes language='hi' to TTS 3. Separate voice mappings for Hindi vs English to use appropriate voices 4. Updated DEFAULT_VOICES with Hindi/Hinglish voice IDs **Implementation**: - Cartesia provider: Added language-based voice selection logic - Orchestrator: Auto-detects Hindi characters (U+0900-U+097F) and sets language - Fixed kwargs conflict by using pop() instead of get() for language parameter **Testing**: - Generated 15.2 second Hinglish conversation with Cartesia - Ishan voice for support agent ✅ - Devansh voice for customer ✅ - Audio quality excellent **Note**: Female Hindi voice support pending - currently falls back to English female voice for female customers. Voice ID needed from Cartesia. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

## Summary - Centralized voice catalog system for managing TTS voices across providers - Priority-based voice selection with support/customer role separation - Fixed MP3 audio concatenation using pydub (was truncating after first turn) - Multi-language voice selection (Hindi, Hinglish, English) ## Key Changes ### Voice Catalog System - New `VoiceCatalog` class with 3-tier fallback matching (exact → flexible → default) - Added priority field to VoiceEntry for tie-breaking - Support voices (Ishan, Devansh) never used for customers - Customer voices (Ayush, Aarav, Aarti) never used for support - Language-aware selection: Devansh (Hindi-only), Ishan (Hinglish/English) ### Audio Fix - Replaced naive byte concatenation with pydub AudioSegment - Properly combines MP3 files with headers/frames intact - Verified with 44-second test conversation (5 turns) ### Provider Updates - Removed hardcoded voice mappings from Cartesia/ElevenLabs providers - OpenAI TTS now strips unsupported language parameter - PersonaService uses VoiceCatalog for all voice selection - Added languages field to CustomerPersona model ### Testing - End-to-end Cartesia TTS test passes - Voice selection priority logic validated - All 7 voice selection scenarios tested 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

siddharthraja and others added 17 commits November 14, 2025 15:21

Merge branch 'test-1': Add Jodo payment collection scenarios

f88f2cd

Adding eleven labs integration and creating a config page

e01b1a0

Improvement to UI, re-arrange and generate cmd

a33f856

Fix cmd line generation for running synthetic data

17f3f19

Updates to folder structure

63a6004

Merge branch 'main' into feat/voice-catalog-priority-selection

0293454

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tons of features and adding cartesia etc. #37

Tons of features and adding cartesia etc. #37

Uh oh!

siddharthraja commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tons of features and adding cartesia etc. #37

Are you sure you want to change the base?

Tons of features and adding cartesia etc. #37

Uh oh!

Conversation

siddharthraja commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant