Skip to content

Conversation

@siddharthraja
Copy link

No description provided.

siddharthraja and others added 17 commits November 14, 2025 15:21
- Implement dual-agent testing system with customer and support personas
- Add 5 pre-built customer scenarios (angry, confused, technical, friendly, edge case)
- Create metrics collection for conversation quality analysis
- Build conversation simulator with OpenAI TTS for audio generation
- Add comprehensive test runner with batch execution capabilities
- Include HTML report generation for test results
- Integrate with LiveKit for room-based agent communication
- Add detailed documentation and quick start guides

Test framework captures:
- Full conversation transcripts with timestamps
- Audio recordings (MP3) of synthetic conversations
- Quality metrics (interruptions, latency, speech rate)
- Behavioral analysis for prompt optimization

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Added 8 realistic Jodo payment collection scenarios based on actual client prompt
- Implemented Hindi-English code-mixing support for Indian context
- Changed file naming to timestamp-first format (YYYYMMDD_HHMMSS) for better sorting
- Updated conversation simulator to use real payment collection scenarios
- Scenarios include cooperative parent, angry customer, wrong person, cancellation attempts, etc.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ture

- Complete refactor from spaghetti code to clean, modular architecture
- Implemented provider pattern for LLM/TTS/Storage abstraction
- Added domain models (Persona, Conversation, Metrics)
- Created service layer with ConversationOrchestrator
- Built CLI interface with multiple commands
- Removed deprecated files (old Flask API, web UI, duplicate code)
- Added comprehensive documentation (USAGE.md, ARCHITECTURE.md)

Key improvements:
- Separation of concerns with distinct layers
- Easy provider switching (OpenAI/ElevenLabs)
- Storage gateway pattern for future cloud support
- Configuration management via YAML/env vars
- Ready for PostgreSQL and FastAPI integration

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove deprecated files from old spaghetti code implementation:
  - conversation_orchestrator.py (old LiveKit room orchestrator)
  - test_runner.py (old test runner using LiveKit rooms)
  - metrics_collector.py (old metrics collection)
  - results_viewer.py (old results viewer)
  - elevenlabs_voices.py (old ElevenLabs implementation)
  - support_agent.py (old support agent implementation)

These files have been replaced by the new modular voice_conversation_generator package.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Integrated Cartesia as a third TTS provider alongside OpenAI and ElevenLabs.

## New Features

### CartesiaTTSProvider
- Full async support using Cartesia Python SDK
- Supports Sonic-3, Sonic-2, and Sonic-Turbo models
- 15+ language support (en, fr, de, es, pt, zh, ja, hi, it, ko, nl, pl, ru, sv, tr)
- Default voice presets for different personas
- PCM to MP3 conversion for consistency

### Voice Configuration
- Pre-configured voice IDs for support/customer personas
- Flexible voice_id or voice_name selection
- Configurable output format (sample rate, encoding, container)

### CLI Integration
- Added `--tts cartesia` option to generate command
- Compatible with all existing CLI features

## Implementation Details

**Files Added:**
- `src/voice_conversation_generator/providers/tts/cartesia.py` - Cartesia TTS provider

**Files Modified:**
- `src/voice_conversation_generator/providers/tts/__init__.py` - Export CartesiaTTSProvider
- `src/voice_conversation_generator/providers/__init__.py` - Include in package exports
- `src/voice_conversation_generator/services/provider_factory.py` - Factory support
- `src/vcg_cli.py` - CLI option for cartesia
- `USAGE.md` - Documentation for Cartesia usage
- `pyproject.toml` - Added cartesia and pydub dependencies

## Usage

```bash
# Set API key
export CARTESIA_API_KEY=your_key

# Generate with Cartesia
uv run python src/vcg_cli.py generate --tts cartesia --customer angry_insufficient_funds
```

## Technical Features
- Streaming TTS generation with async iteration
- Automatic PCM to MP3 conversion (requires pydub)
- Graceful fallback if pydub unavailable
- Comprehensive error handling

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Updated OpenAI provider to properly handle GPT-4.1 model with correct parameter usage.

## Changes

### OpenAI Provider (`openai.py`)
- Updated model detection to handle `gpt-4.1` (models starting with `gpt-4.`)
- Uses `max_completion_tokens` parameter for GPT-4.1 (new API format)
- Maintains backward compatibility with older GPT-4 models

### Documentation Updates
- `CONVERSATION_LOGIC_EXPLAINED.md`: Updated all gpt-4o references to gpt-4.1
- `QUICK_COMMANDS.md`: Updated LLM_MODEL example to gpt-4.1

## Testing
- ✅ Verified GPT-4.1 initialization and model name
- ✅ Tested completion with simple prompt (Hindi translation)
- ✅ Generated full conversation with cooperative_parent persona
- ✅ Confirmed correct parameter usage (max_completion_tokens)
- ✅ Verified LLM metrics show "gpt-4.1"

## Usage
```bash
# Set GPT-4.1 as default model
export LLM_MODEL=gpt-4.1

# Generate conversation with GPT-4.1
uv run python src/vcg_cli.py generate --customer angry_insufficient_funds
```

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
… support

Added detailed documentation on using Cartesia for Indian accents including:
- Native Hindi and Hinglish support confirmation
- Language parameter usage (language='hi')
- Dedicated Hinglish voices for Indian market
- Code examples and implementation steps
- Voice configuration guide

Key findings:
- Cartesia launched Hinglish support in 2024 specifically for India
- Supports fluid code-switching between Hindi and English
- Natural Indian intonation and pronunciation
- 40ms latency with India deployments
- Current implementation already supports language parameter

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
**Problem**: Cartesia TTS was receiving OpenAI-specific model names ("tts-1")
and voice names ("onyx", "nova") from the shared config, causing API errors.

**Root Cause**:
- Default config had model="tts-1" for all TTS providers
- Personas used OpenAI voice names regardless of TTS provider
- Cartesia provider was using voice_config.model directly

**Solution**:
1. Validate model in Cartesia __init__: If config contains non-Cartesia model
   (e.g., "tts-1"), use default "sonic-3"
2. Don't use voice_config.model in generate_speech: Always use default_model
   to avoid cross-provider contamination
3. Add voice name mapping: Map OpenAI voice names ("onyx", "nova", etc.) to
   Cartesia voice IDs (UUIDs) for seamless provider switching

**Testing**:
- Generated conversation with Cartesia TTS successfully
- Audio file created (23.1 seconds)
- All voice mappings working correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
**Problem**: Two separate data directories were being created:
- /Users/sid/Documents/GitHub/livekit-starter/data/conversations
- /Users/sid/Documents/GitHub/livekit-starter/src/data/conversations

This happened because the relative path "data/conversations" was resolved
differently depending on where the script was run from (root vs src).

**Solution**:
1. Added _get_project_root() helper to find project root (via pyproject.toml/.git)
2. Added _get_default_data_path() to compute absolute path to data directory
3. Updated StorageConfig to use absolute path by default
4. Deleted src/data directory completely
5. Moved latest test files to root data directory

**Benefits**:
- All conversations now stored in single location
- Works correctly regardless of where scripts are run from
- No duplicate data directories
- Cleaner file organization

**Testing**:
- Generated test conversation - saved to root data directory
- Verified src/data no longer exists
- All file paths now absolute

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
**Voice Selection**:
- Support Agent: **Ishan** (fd2ada67-c2d9-4afe-b474-6386b87d8fc3)
  - "Conversational male for Hinglish sales and customer support"
  - Perfect for professional support conversations in Hinglish

- Male Customer: **Devansh** (1259b7e3-cb8a-43df-9446-30971a46b8b0)
  - "Warm, conversational Indian male adult voice"
  - Ideal for casual customer interactions

**Features Added**:
1. Language-aware voice mapping - detects Hindi characters and switches to
   Indian voices automatically
2. Auto-detects Hindi text (Devanagari script) and passes language='hi' to TTS
3. Separate voice mappings for Hindi vs English to use appropriate voices
4. Updated DEFAULT_VOICES with Hindi/Hinglish voice IDs

**Implementation**:
- Cartesia provider: Added language-based voice selection logic
- Orchestrator: Auto-detects Hindi characters (U+0900-U+097F) and sets language
- Fixed kwargs conflict by using pop() instead of get() for language parameter

**Testing**:
- Generated 15.2 second Hinglish conversation with Cartesia
- Ishan voice for support agent ✅
- Devansh voice for customer ✅
- Audio quality excellent

**Note**: Female Hindi voice support pending - currently falls back to English
female voice for female customers. Voice ID needed from Cartesia.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
## Summary
- Centralized voice catalog system for managing TTS voices across providers
- Priority-based voice selection with support/customer role separation
- Fixed MP3 audio concatenation using pydub (was truncating after first turn)
- Multi-language voice selection (Hindi, Hinglish, English)

## Key Changes

### Voice Catalog System
- New `VoiceCatalog` class with 3-tier fallback matching (exact → flexible → default)
- Added priority field to VoiceEntry for tie-breaking
- Support voices (Ishan, Devansh) never used for customers
- Customer voices (Ayush, Aarav, Aarti) never used for support
- Language-aware selection: Devansh (Hindi-only), Ishan (Hinglish/English)

### Audio Fix
- Replaced naive byte concatenation with pydub AudioSegment
- Properly combines MP3 files with headers/frames intact
- Verified with 44-second test conversation (5 turns)

### Provider Updates
- Removed hardcoded voice mappings from Cartesia/ElevenLabs providers
- OpenAI TTS now strips unsupported language parameter
- PersonaService uses VoiceCatalog for all voice selection
- Added languages field to CustomerPersona model

### Testing
- End-to-end Cartesia TTS test passes
- Voice selection priority logic validated
- All 7 voice selection scenarios tested

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant