This project implements a comprehensive system for Pepper, a humanoid robot assistant, combining emotional voice response capabilities with emotion-based movement choreography.
Orchestrator4 is a multi-agent conversational AI system designed for Pepper, a humanoid robot. The system integrates multiple specialized agents to handle different types of user queries with intelligent routing, fallback mechanisms, and Australian context awareness.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR4 SYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β USER INPUT β β SPEECH-TO- β β MAIN LOOP β β
β β (Voice/Text) βββββΆβ TEXT (STT) βββββΆβ (main()) β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ORCHESTRATOR4 CLASS β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β INPUT PROCESSING PIPELINE β β β
β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β
β β β β handle_ β β Exception β β Keyword β β Agent β β β β
β β β β input() ββββ Handler ββββ Analysis ββββ Selection β β β β
β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β β
β β βΌ β β
β β β AGENT ECOSYSTEM β β β
β β β β β β
β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β
β β β β PEPPER β β SEARCH β β SEARCH β β SUMMARY β β β β
β β β β AGENT β β AGENT β β AGENT3 β β AGENT β β β β
β β β β β β (Fallback) β β (Primary) β β β β β β
β β β β β’ Conversa- β β β’ DuckDuckGoβ β β’ Custom β β β’ Australianβ β β β
β β β β tional β β β’ Basic β β Search β β Context β β β β
β β β β β’ Memory β β β’ Fallback β β β’ Advanced β β Units β β β β
β β β β β’ Personalityβ β β’ GPT-4o β β Features β β β’ Filtering β β β β
β β β β β’ GPT-4o β β Option β β Features β β β’ Metric β β β β
β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β
β β β β β β β β β
β β β βββββββββββββββββββββΌββββββββββββββββββββΌββββββββββββββββββββ β β
β β β β β β β
β β β βΌ βΌ β β
β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β RESPONSE PROCESSING PIPELINE β β β β
β β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββ β β β β
β β β β β process_ β β Sentence β β Emoji β β TTS β β β β β
β β β β β response()ββββ Splitter ββββ Filter ββββ Engine β β β β β
β β β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββ β β β β
β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β PEPPER TTS β β HTTP REQUEST β β AUDIO OUTPUT β β
β β (10.0.0.244) ββββββ (Threaded) ββββββ (Robot Voice) β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π€ PepperAgent - Conversational personality and memory management
- Sweet, caring robot personality with GPT-4o
- Conversation memory (10 messages, 1500 tokens)
- Response caching and character limit enforcement (200 chars)
π SearchAgent3 - Advanced search with custom API
- Custom search API integration (contact me for details)
- Rate limiting, caching, and specialized formatting
- Error handling with fallback to SearchAgent
π SearchAgent - Fallback search agent
- DuckDuckGo search integration
- Basic response formatting
- Used when SearchAgent3 fails
π SummaryAgent - Australian context and response filtering
- Metric unit conversion (FahrenheitβCelsius, milesβkm)
- Australian holiday filtering and Canberra-specific context
- Phonetic symbol rewriting for TTS
- Intelligent Routing: Context-aware agent selection based on keyword analysis
- Multi-Level Fallback: SearchAgent3 β SearchAgent β PepperAgent
- Australian Context: Metric units, local holidays, Canberra-specific information
- Performance Optimizations: Response caching, rate limiting, threaded operations
- Error Handling: Comprehensive error handling at every level
- Python 3.8+
- OpenAI API key
- Microphone for voice input
- Pepper robot hardware (for TTS output)
- Network access to Pepper TTS endpoint (10.0.0.244:5000)
- Network access to custom search API (192.168.194.33:8060)
- Clone the repository:
git clone <repository-url>
cd pepper-robot-assistant- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.template .env- Configure your .env file:
OPENAI_API_KEY=your_openai_api_key_here- Download Vosk model for speech recognition:
# Download the small model for English
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip- Start the system:
python orchestrator4.py- Usage:
- Press Enter to start speaking
- Speak your query clearly
- Press Enter again to stop recording
- The system will process your input and respond through Pepper's TTS
- Example queries:
- Conversational: "Hello, how are you?", "Tell me a joke"
- Search: "What is the weather in Canberra?", "Who is the current Prime Minister?"
- Summary: "Summarize the latest news about AI"
- Location: "Where am I?" (returns UC Collaborative Robotics Lab info)
- VC Info: "Who is the VC of UC?" (returns Bill Shorten info)
- Agent Selection: The system automatically routes queries to the most appropriate agent
- Fallback System: If SearchAgent3 fails, it falls back to SearchAgent, then to PepperAgent
- Australian Context: Responses are automatically converted to metric units and Australian context
- TTS Output: Responses are sent to Pepper's TTS system at 10.0.0.244:5000
- Performance Monitoring: The system provides profiling information for each interaction
The voice system processes speech input, generates responses using an LLM, and converts responses to natural-sounding speech with emotional expression.
- Real-time speech-to-text using Vosk
- Natural, expressive text-to-speech using ElevenLabs (emotion-aware)
- LLM-based emotion detection for each sentence
- Conversational memory using LangChain
- Web search capabilities using DuckDuckGo
- Interactive voice-based conversation
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Install ffmpeg (includes ffplay, required for TTS playback):
- macOS:
brew install ffmpeg
- Linux (Debian/Ubuntu):
sudo apt update && sudo apt install ffmpeg - Other Linux distros:
Use your package manager to install
ffmpeg.
- Create a .env file:
cp .env.template .env- Add your API keys to the .env file:
- OPENAI_API_KEY: Your OpenAI API key
- ELEVENLABS_API_KEY: Your ElevenLabs API key
- Download Vosk model:
# Download the small model for English
wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zipThe choreography system enables Pepper to perform emotion-based movements in response to emotional tags, designed to be modular and easily extensible.
.
βββ README.md
βββ orchestrate_choreography.py # Main orchestration file
βββ choreography/
βββ choreography_engine.py # Core engine for handling movements
βββ happy.py # Happy emotion movements
βββ sad.py # Sad emotion movements
βββ test_choreography_engine.py # Test suite
-
ChoreographyEngine: Core class that manages and executes emotion-based movements
- Dynamically loads movement handlers from Python files
- Provides error handling and case-insensitive emotion matching
-
Emotion Handlers: Individual Python files for each emotion (e.g., happy.py, sad.py)
- Each file contains an
execute_movement()function - Defines specific movement sequences for that emotion
- Each file contains an
-
ChoreographyOrchestrator: Main interface for executing movements
- Manages the ChoreographyEngine
- Provides a simple API for emotion-based movement execution
from orchestrate_choreography import ChoreographyOrchestrator
# Create orchestrator
orchestrator = ChoreographyOrchestrator()
# Execute movement for an emotion
orchestrator.handle_emotion('happy')To add a new emotion:
- Create a new Python file in the
choreographydirectory (e.g.,excited.py) - Implement the
execute_movement()function - The system will automatically load the new emotion handler
Example:
# choreography/excited.py
def execute_movement():
"""
Execute the movement sequence for the 'excited' emotion.
"""
# Add your movement commands here
passRun the main script:
python orchestrator4.pyThe system will:
- Listen for voice input using Vosk STT
- Route the input to appropriate agents based on keyword analysis
- Process through the selected agent (PepperAgent, SearchAgent3, SearchAgent, or SummaryAgent)
- Apply Australian context and filtering if needed
- Split response into sentences and filter emojis
- Send to Pepper's TTS system for audio output
- Provide performance profiling information
Run the legacy script:
python orchestrator.pyThe system will:
- Listen for voice input
- Convert speech to text using Vosk
- Process the text through the LangChain agent
- Generate a response
- Detect the emotion of each sentence in the response using an LLM
- Convert each sentence to expressive speech using ElevenLabs, matching the detected emotion
- Play the expressive response through your speakers
Run the test suite:
python -m unittest choreography/test_choreography_engine.py -vThe test suite verifies:
- Proper initialization
- Movement execution
- Error handling
- Case insensitivity
- Edge cases
- Python 3.8+
- OpenAI API key
- Microphone for voice input
- Speakers for voice output (legacy system)
- Pepper robot hardware (for TTS output and movement execution)
- Internet connection for API access
- Network access to Pepper TTS endpoint (10.0.0.244:5000)
- Network access to custom search API
- Pepper TTS:
10.0.0.244:5000 - Search API:
192.168.194.33:8060
- Memory: 10 messages, 1500 tokens
- Response Limit: 200 characters
- Cache TTL: 1 hour
- Search Rate Limit: 1 second
- Primary LLM: GPT-4o
- Temperature: 0.2-0.7 (varies by agent)
- System prompts: Specialized per agent
-
Speech Recognition Not Working
- Ensure Vosk model is downloaded and in the correct location
- Check microphone permissions and settings
- Verify audio input device is working
-
TTS Not Working
- Check network connectivity to Pepper TTS endpoint (10.0.0.244:5000)
- Verify Pepper robot is powered on and accessible
- Check firewall settings
-
Search Not Working
- Verify network connectivity to search API
- Check API rate limiting
- Ensure OpenAI API key is valid
-
Agent Failures
- Check OpenAI API key and quota
- Verify internet connectivity
- Review error logs for specific agent issues
To enable detailed logging, modify the orchestrator4.py file to add logging statements or run with verbose output.
- Enhanced agent selection algorithms
- More sophisticated fallback mechanisms
- Additional specialized agents
- Integration with choreography system
- Real-time performance monitoring dashboard
- Enhanced emotion detection accuracy
- More natural speech synthesis
- Improved conversation memory
- Better noise handling
- Add more emotion handlers
- Implement actual Pepper movement commands
- Add movement sequence timing control
- Add movement combination capabilities
- Integration with Orchestrator4 for synchronized movement and speech
Make sure your microphone is properly configured and working before running the script. The system will automatically detect and use your default microphone. For Orchestrator4, ensure Pepper robot is accessible on the network and the TTS endpoint is responding.