Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# 🎙️ Nova 2 Sonic Multi-Agent System

A speech-to-speech multi-agent system that unlocks dynamic configuration switching for AWS Bedrock's Nova 2 Sonic model during live conversations.

## ⚠️ The Problem

Speech-To-Speech models face a critical limitation: **static configuration**. Once a conversation starts, you're locked into:
- A single system prompt that can't adapt to different use cases
- One fixed set of tools
- Static voice characteristics

When you need different configurations for different use cases (different prompts and tools), you want specialized agents - each focusing on one task with its own optimized setup. This gives you better control and precision compared to one generalist agent trying to handle everything.

## 💡 The Solution

**Dynamic agent switching using tool triggers** - enabling real-time configuration changes mid-conversation without losing context.

Instead of one overloaded agent, you get:
- Multiple specialized agents, each with focused tools and optimized prompts
- Seamless transitions between agents based on user intent
- Preserved conversation history across switches
- High accuracy maintained through agent specialization

## 🌟 Why This Matters

✅ **Specialization without compromise** - Each agent excels at its domain
✅ **Seamless user experience** - No jarring resets or context loss
✅ **Better accuracy** - Fewer tools per agent = better performance
✅ **New use cases unlocked** - Enterprise support escalation, healthcare triage, financial services routing, and more

## 🚀 Implementation

This demo showcases three specialized agents that switch dynamically based on conversation flow:

- **Support Agent (Matthew)**: Handles customer issues, creates support tickets
- **Sales Agent (Amy)**: Processes orders, provides product information
- **Tracking Agent (Tiffany)**: Checks order status and delivery updates

Each agent brings its own system prompt, tools, and voice - switching happens transparently when the user's intent changes.

## 📁 Project Structure

```
dynamic-configuration/
├── main.py # Entry point
├── src/
│ ├── multi_agent.py # Agent orchestration
│ ├── core/ # Core functionality
│ │ ├── stream_manager.py # Bedrock streaming
│ │ ├── event_templates.py # Event generation
│ │ ├── tool_processor.py # Tool execution
│ │ ├── config.py # Configuration
│ │ └── utils.py # Utilities
│ ├── agents/ # Agent definitions
│ │ ├── agent_config.py # Agent configs
│ │ └── tools.py # Tool implementations
│ └── audio/ # Audio handling
│ └── audio_streamer.py # Audio I/O
├── docs/ # Documentation
│ └── STRUCTURE.md # System design
└── requirements.txt # Dependencies
```

## ⚙️ Setup

1. **Install dependencies**:
```bash
pip install -r requirements.txt
```

2. **Configure AWS credentials**:
```bash
export AWS_ACCESS_KEY_ID="your_key"
export AWS_SECRET_ACCESS_KEY="your_secret"
export AWS_REGION="us-east-1"
```

3. **Run**:
```bash
python main.py
```

## 🎮 Usage

```bash
# Normal mode
python main.py

# Debug mode
python main.py --debug
```

## 🔧 Configuration

Edit `src/core/config.py` to modify:
- Audio settings (sample rates, chunk size)
- Model parameters (temperature, top_p, max_tokens)
- AWS region and model ID

## 📋 Requirements

- Python 3.12+
- AWS Bedrock access
- Microphone and speakers
- PyAudio dependencies (portaudio)

## Data Flow

```mermaid
sequenceDiagram
participant User
participant MultiAgentSonic
participant StreamManager
participant Bedrock
participant ToolProcessor

User->>MultiAgentSonic: Speak (microphone)
MultiAgentSonic->>StreamManager: Audio chunks
StreamManager->>Bedrock: Audio events
Bedrock->>StreamManager: Response events
StreamManager->>MultiAgentSonic: Audio chunks
MultiAgentSonic->>User: Play audio (speakers)


alt Switch Agent Tool Use
User->>MultiAgentSonic: Speak (microphone)
MultiAgentSonic->>StreamManager: Audio chunks
StreamManager->>Bedrock: Audio events
Bedrock->>StreamManager: Switch Agent tool use detected
StreamManager->>ToolProcessor: Execute Switch Agent
ToolProcessor->>MultiAgentSonic: Start new Session
MultiAgentSonic->>Bedrock: Send text input to invoke conversation
Bedrock->>StreamManager: Response events
StreamManager->>MultiAgentSonic: Audio chunks
MultiAgentSonic->>User: Play audio (speakers)
end
```

## Agent Switching Flow

```mermaid
stateDiagram-v2
[*] --> ActiveConversation
ActiveConversation --> DetectSwitch: User requests agent change
DetectSwitch --> SetSwitchFlag: trigger "switch_agent" tool
SetSwitchFlag --> StopStreaming: StreamManager sets switch_requested = True
StopStreaming --> PlayMusic: AudioStreamer stops
PlayMusic --> CloseStream: MultiAgentSonic plays transition
CloseStream --> SwitchAgent: Close current stream
SwitchAgent --> RestartStream: Load new agent config
RestartStream --> ActiveConversation: Resume with new agent
```

## Credits
Music by <a href="https://pixabay.com/users/hitslab-47305729/?utm_source=link-attribution&utm_medium=referral&utm_campaign=music&utm_content=324902">Ievgen Poltavskyi</a> from <a href="https://pixabay.com//?utm_source=link-attribution&utm_medium=referral&utm_campaign=music&utm_content=324902">Pixabay</a>


Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# Project Structure

## Directory Layout

```
sonic_multi_agent/
├── main.py # Application entry point
├── README.md # Project overview
├── requirements.txt # Python dependencies
├── music.mp3 # Transition music for agent switches
├── .gitignore # Git ignore patterns
├── src/ # Source code
│ ├── __init__.py
│ ├── multi_agent.py # Multi-agent orchestrator
│ │
│ ├── core/ # Core functionality
│ │ ├── __init__.py
│ │ ├── stream_manager.py # Bedrock bidirectional streaming
│ │ ├── event_templates.py # Bedrock event JSON generators
│ │ ├── tool_processor.py # Async tool executor
│ │ ├── config.py # Configuration constants
│ │ └── utils.py # Debug logging & timing utilities
│ │
│ ├── agents/ # Agent definitions
│ │ ├── __init__.py
│ │ ├── agent_config.py # Agent configurations (Support, Sales, Tracking)
│ │ └── tools.py # Tool implementations
│ │
│ └── audio/ # Audio handling
│ ├── __init__.py
│ └── audio_streamer.py # PyAudio I/O manager
└── docs/ # Documentation
└── STRUCTURE.md # This file
```

## Module Responsibilities

### Root Level

**main.py**
- Entry point with argument parsing (`--debug` flag)
- Initializes MultiAgentSonic with model and region
- Handles keyboard interrupts and errors gracefully

### src/multi_agent.py

**MultiAgentSonic** - Orchestrates multi-agent conversations
- Manages active agent state and conversation history
- Handles agent switching with transition music (pygame)
- Creates and coordinates StreamManager and AudioStreamer
- Maintains conversation context across agent switches

### src/core/

**stream_manager.py** - BedrockStreamManager
- Manages bidirectional streaming with AWS Bedrock Nova 2 Sonic
- Handles audio input/output queues
- Processes response events (text, audio, tool calls)
- Coordinates tool execution via ToolProcessor
- Manages conversation state and barge-in detection
- Tracks agent switching requests

**event_templates.py** - EventTemplates
- Generates Bedrock-compatible JSON events
- Session events (start/end)
- Content events (audio/text/tool results)
- Prompt configuration with system instructions
- Tool schemas for agent capabilities

**tool_processor.py** - ToolProcessor
- Executes tools asynchronously
- Maps tool names to implementations
- Manages concurrent tool tasks
- Handles tool errors and results

**config.py**
- Audio configuration (sample rates, chunk size, channels)
- AWS configuration (model ID, region)
- Model parameters (max tokens, temperature, top_p)
- Debug settings

**utils.py**
- Debug logging with timestamps (`debug_print`)
- Performance timing decorators (`time_it`, `time_it_async`)

### src/agents/

**agent_config.py**
- Agent dataclass with voice_id, instruction, and tools
- AGENTS dictionary with three specialized agents:
- **Support (Matthew)**: Customer support with ticket creation
- **Sales (Amy)**: Product sales and ordering
- **Tracking (Tiffany)**: Order status and delivery tracking
- Each agent has unique system prompt and tool set

**tools.py**
- Tool implementations:
- `open_ticket_tool`: Creates support tickets
- `order_computers_tool`: Processes computer orders
- `check_order_location_tool`: Checks order delivery status

### src/audio/

**audio_streamer.py** - AudioStreamer
- Manages PyAudio streams for input/output
- Captures microphone input via callback
- Plays audio output to speakers
- Handles barge-in detection
- Audio buffering and queue management

## Data Flow

```mermaid
sequenceDiagram
participant User
participant AudioStreamer
participant StreamManager
participant Bedrock
participant ToolProcessor
participant Output

User->>AudioStreamer: Speak (microphone)
AudioStreamer->>StreamManager: Audio chunks
StreamManager->>Bedrock: Audio events
Bedrock->>StreamManager: Response events

alt Text Response
StreamManager->>Output: Display text
end

alt Audio Response
StreamManager->>AudioStreamer: Audio chunks
AudioStreamer->>User: Play audio (speakers)
end

alt Tool Use
StreamManager->>ToolProcessor: Execute tool
ToolProcessor->>StreamManager: Tool result
StreamManager->>Bedrock: Tool result event
end
```

## Agent Switching Flow

```mermaid
stateDiagram-v2
[*] --> ActiveConversation
ActiveConversation --> DetectSwitch: User requests agent change
DetectSwitch --> SetSwitchFlag: Bedrock invokes switch_agent tool
SetSwitchFlag --> StopStreaming: StreamManager sets flag
StopStreaming --> PlayMusic: AudioStreamer stops
PlayMusic --> CloseStream: MultiAgentSonic plays transition
CloseStream --> SwitchAgent: Close current stream
SwitchAgent --> RestartStream: Load new agent config
RestartStream --> ActiveConversation: Resume with new agent
```

## Key Design Patterns

1. **Separation of Concerns**: Each module has a single, well-defined responsibility
2. **Queue-based Communication**: Async queues decouple audio processing from streaming
3. **Event-driven Architecture**: Response handling via Bedrock events
4. **Factory Pattern**: EventTemplates generates configuration-specific events
5. **Strategy Pattern**: Different agents share the same interface
6. **Dependency Injection**: Components receive dependencies at initialization

## Architecture Benefits

- **Modularity**: Components can be tested and modified independently
- **Scalability**: Easy to add new agents, tools, or audio features
- **Maintainability**: Clear structure makes debugging straightforward
- **Flexibility**: Agent switching without losing conversation context
- **Performance**: Async operations prevent blocking

## Adding New Components

### New Agent
1. Add agent configuration to `src/agents/agent_config.py` in AGENTS dict
2. Define voice_id, instruction (system prompt), and tools list
3. Agent automatically available for switching

### New Tool
1. Implement function in `src/agents/tools.py`
2. Add to agent's tools list in `src/agents/agent_config.py`
3. Tool automatically registered in ToolProcessor

### New Audio Feature
- Modify `src/audio/audio_streamer.py`
- Update audio configuration in `src/core/config.py` if needed

### New Event Type
- Add template method to `src/core/event_templates.py`
- Use in `src/core/stream_manager.py` for sending events

### New Configuration
- Add constants to `src/core/config.py`
- Import where needed across modules
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""Main entry point for Nova 2 Sonic multi-agent system."""
import asyncio
import argparse
from src.multi_agent import MultiAgentSonic
from src.core.config import DEFAULT_MODEL_ID, DEFAULT_REGION
from src.core import config


async def main(debug: bool = False):
"""Run multi-agent conversation."""
config.DEBUG = debug

sonic = MultiAgentSonic(
model_id=DEFAULT_MODEL_ID,
region=DEFAULT_REGION,
debug=debug
)

await sonic.start_conversation()


if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Nova 2 Sonic Multi-Agent System')
parser.add_argument('--debug', action='store_true', help='Enable debug mode')
args = parser.parse_args()

try:
asyncio.run(main(debug=args.debug))
except KeyboardInterrupt:
print("\n👋 Goodbye!")
except Exception as e:
print(f"Error: {e}")
if args.debug:
import traceback
traceback.print_exc()

Binary file not shown.
Loading