Talk to your AI agent like a phone call.
Mic → Whisper → OpenClaw → TTS → Speaker
A continuous voice conversation loop that captures your speech, transcribes it locally with Whisper, sends it to any OpenClaw gateway (any model), and speaks the response back to you.
- Listen — Records from your mic, detects speech, stops on silence
- Transcribe — Local Whisper model converts speech to text
- Think — Sends text to your OpenClaw agent via
openclaw agentCLI - Speak — Converts the reply to audio via ElevenLabs, OpenAI TTS, or macOS
say - Repeat
Works with any OpenClaw gateway and any model configured on that gateway (Claude, GPT, Gemini, local models, etc.).
- Python 3.9+
- OpenClaw CLI installed and configured
- ffmpeg and portaudio
brew install ffmpeg portaudiosudo apt install ffmpeg portaudio19-devgit clone https://github.com/jdawe/openclaw-voice-loop.git
cd openclaw-voice-loop
pip install -r requirements.txt# Basic — uses macOS `say` for TTS
python voice_loop.py
# With ElevenLabs TTS (highest quality)
export ELEVENLABS_API_KEY=your_key_here
python voice_loop.py
# With OpenAI TTS
export OPENAI_API_KEY=your_key_here
export OPENAI_VOICE=nova # optional, default: alloy
python voice_loop.py
# With a remote OpenClaw gateway
export OPENCLAW_GATEWAY_URL=wss://your-gateway.example.com
export OPENCLAW_GATEWAY_TOKEN=your_token
python voice_loop.pyAll configuration is via environment variables:
TTS priority: ElevenLabs → OpenAI → macOS say (first available key wins).
| Variable | Default | Description |
|---|---|---|
ELEVENLABS_API_KEY |
(none) | ElevenLabs API key (highest TTS priority) |
ELEVENLABS_VOICE_ID |
21m00Tcm4TlvDq8ikWAM |
ElevenLabs voice ID (default: Rachel) |
ELEVENLABS_SPEED |
1.0 |
Playback speed multiplier |
OPENAI_API_KEY |
(none) | OpenAI API key (second TTS priority) |
OPENAI_VOICE |
alloy |
OpenAI TTS voice: alloy, echo, fable, onyx, nova, shimmer |
WHISPER_MODEL |
tiny |
Whisper model: tiny, base, small, medium, large |
OPENCLAW_GATEWAY_URL |
(none) | Remote gateway WebSocket URL |
OPENCLAW_GATEWAY_TOKEN |
(none) | Gateway auth token |
VOICE_SESSION_ID |
voice-loop |
OpenClaw session ID (maintains conversation context) |
AGENT_TIMEOUT |
60 |
Seconds to wait for agent reply |
SAY_RATE |
350 |
macOS say words per minute |
MAX_TURNS |
50 |
Max conversation turns before auto-reset |
- First run downloads the Whisper model (~75MB for
tiny). Subsequent runs are instant. - Use
tinyorbaseWhisper models for fastest transcription.smallis a good accuracy/speed tradeoff. - The loop auto-calibrates your mic on startup — stay quiet for 1 second.
- Whisper hallucination filtering is built in (ignores phantom "thank you" / "bye" transcriptions).
- Sessions persist across turns, so the agent remembers context within a conversation.
MIT