Skip to content

Voice conversation loop for OpenClaw — talk to your AI agent like a phone call

License

Notifications You must be signed in to change notification settings

jdawe/openclaw-voice-loop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OpenClaw Voice Loop 🎙️

Talk to your AI agent like a phone call.

Mic → Whisper → OpenClaw → TTS → Speaker

A continuous voice conversation loop that captures your speech, transcribes it locally with Whisper, sends it to any OpenClaw gateway (any model), and speaks the response back to you.

How It Works

  1. Listen — Records from your mic, detects speech, stops on silence
  2. Transcribe — Local Whisper model converts speech to text
  3. Think — Sends text to your OpenClaw agent via openclaw agent CLI
  4. Speak — Converts the reply to audio via ElevenLabs, OpenAI TTS, or macOS say
  5. Repeat

Works with any OpenClaw gateway and any model configured on that gateway (Claude, GPT, Gemini, local models, etc.).

Requirements

  • Python 3.9+
  • OpenClaw CLI installed and configured
  • ffmpeg and portaudio

macOS

brew install ffmpeg portaudio

Linux

sudo apt install ffmpeg portaudio19-dev

Setup

git clone https://github.com/jdawe/openclaw-voice-loop.git
cd openclaw-voice-loop
pip install -r requirements.txt

Usage

# Basic — uses macOS `say` for TTS
python voice_loop.py

# With ElevenLabs TTS (highest quality)
export ELEVENLABS_API_KEY=your_key_here
python voice_loop.py

# With OpenAI TTS
export OPENAI_API_KEY=your_key_here
export OPENAI_VOICE=nova  # optional, default: alloy
python voice_loop.py

# With a remote OpenClaw gateway
export OPENCLAW_GATEWAY_URL=wss://your-gateway.example.com
export OPENCLAW_GATEWAY_TOKEN=your_token
python voice_loop.py

Configuration

All configuration is via environment variables:

TTS priority: ElevenLabs → OpenAI → macOS say (first available key wins).

Variable Default Description
ELEVENLABS_API_KEY (none) ElevenLabs API key (highest TTS priority)
ELEVENLABS_VOICE_ID 21m00Tcm4TlvDq8ikWAM ElevenLabs voice ID (default: Rachel)
ELEVENLABS_SPEED 1.0 Playback speed multiplier
OPENAI_API_KEY (none) OpenAI API key (second TTS priority)
OPENAI_VOICE alloy OpenAI TTS voice: alloy, echo, fable, onyx, nova, shimmer
WHISPER_MODEL tiny Whisper model: tiny, base, small, medium, large
OPENCLAW_GATEWAY_URL (none) Remote gateway WebSocket URL
OPENCLAW_GATEWAY_TOKEN (none) Gateway auth token
VOICE_SESSION_ID voice-loop OpenClaw session ID (maintains conversation context)
AGENT_TIMEOUT 60 Seconds to wait for agent reply
SAY_RATE 350 macOS say words per minute
MAX_TURNS 50 Max conversation turns before auto-reset

Tips

  • First run downloads the Whisper model (~75MB for tiny). Subsequent runs are instant.
  • Use tiny or base Whisper models for fastest transcription. small is a good accuracy/speed tradeoff.
  • The loop auto-calibrates your mic on startup — stay quiet for 1 second.
  • Whisper hallucination filtering is built in (ignores phantom "thank you" / "bye" transcriptions).
  • Sessions persist across turns, so the agent remembers context within a conversation.

License

MIT

About

Voice conversation loop for OpenClaw — talk to your AI agent like a phone call

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages