OpenClaw Voice Loop 🎙️

Talk to your AI agent like a phone call.

Mic → Whisper → OpenClaw → TTS → Speaker

A continuous voice conversation loop that captures your speech, transcribes it locally with Whisper, sends it to any OpenClaw gateway (any model), and speaks the response back to you.

How It Works

Listen — Records from your mic, detects speech, stops on silence
Transcribe — Local Whisper model converts speech to text
Think — Sends text to your OpenClaw agent via openclaw agent CLI
Speak — Converts the reply to audio via ElevenLabs, OpenAI TTS, or macOS say
Repeat

Works with any OpenClaw gateway and any model configured on that gateway (Claude, GPT, Gemini, local models, etc.).

Requirements

Python 3.9+
OpenClaw CLI installed and configured
ffmpeg and portaudio

macOS

brew install ffmpeg portaudio

Linux

sudo apt install ffmpeg portaudio19-dev

Setup

git clone https://github.com/jdawe/openclaw-voice-loop.git
cd openclaw-voice-loop
pip install -r requirements.txt

Usage

# Basic — uses macOS `say` for TTS
python voice_loop.py

# With ElevenLabs TTS (highest quality)
export ELEVENLABS_API_KEY=your_key_here
python voice_loop.py

# With OpenAI TTS
export OPENAI_API_KEY=your_key_here
export OPENAI_VOICE=nova  # optional, default: alloy
python voice_loop.py

# With a remote OpenClaw gateway
export OPENCLAW_GATEWAY_URL=wss://your-gateway.example.com
export OPENCLAW_GATEWAY_TOKEN=your_token
python voice_loop.py

Configuration

All configuration is via environment variables:

TTS priority: ElevenLabs → OpenAI → macOS say (first available key wins).

Variable	Default	Description
`ELEVENLABS_API_KEY`	(none)	ElevenLabs API key (highest TTS priority)
`ELEVENLABS_VOICE_ID`	`21m00Tcm4TlvDq8ikWAM`	ElevenLabs voice ID (default: Rachel)
`ELEVENLABS_SPEED`	`1.0`	Playback speed multiplier
`OPENAI_API_KEY`	(none)	OpenAI API key (second TTS priority)
`OPENAI_VOICE`	`alloy`	OpenAI TTS voice: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
`WHISPER_MODEL`	`tiny`	Whisper model: `tiny`, `base`, `small`, `medium`, `large`
`OPENCLAW_GATEWAY_URL`	(none)	Remote gateway WebSocket URL
`OPENCLAW_GATEWAY_TOKEN`	(none)	Gateway auth token
`VOICE_SESSION_ID`	`voice-loop`	OpenClaw session ID (maintains conversation context)
`AGENT_TIMEOUT`	`60`	Seconds to wait for agent reply
`SAY_RATE`	`350`	macOS `say` words per minute
`MAX_TURNS`	`50`	Max conversation turns before auto-reset

Tips

First run downloads the Whisper model (~75MB for tiny). Subsequent runs are instant.
Use tiny or base Whisper models for fastest transcription. small is a good accuracy/speed tradeoff.
The loop auto-calibrates your mic on startup — stay quiet for 1 second.
Whisper hallucination filtering is built in (ignores phantom "thank you" / "bye" transcriptions).
Sessions persist across turns, so the agent remembers context within a conversation.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
voice_loop.py		voice_loop.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenClaw Voice Loop 🎙️

How It Works

Requirements

macOS

Linux

Setup

Usage

Configuration

Tips

License

About

Uh oh!

Releases

Packages

Languages

License

jdawe/openclaw-voice-loop

Folders and files

Latest commit

History

Repository files navigation

OpenClaw Voice Loop 🎙️

How It Works

Requirements

macOS

Linux

Setup

Usage

Configuration

Tips

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages