Train custom wake word models for OpenWakeWord using synthetic voices from Kokoro TTS combined with your real voice recordings.
Why this exists: The official OpenWakeWord training process relies on Google Colab notebooks that frequently break. This repo provides a working local training pipeline that produces quality models.
- A trained
.onnxwake word model (~400KB) - Works with OpenWakeWord, Home Assistant, or any system that supports ONNX models
- Typical results: 70%+ accuracy, <2 false positives per hour
- NVIDIA GPU with CUDA (RTX 3060 12GB or better recommended)
- Docker with NVIDIA Container Toolkit
- ~20GB disk space for training data
Docker is the recommended approach - it handles all the dependency hell for you.
git clone https://github.com/CoreWorxLab/openwakeword-training.git
cd openwakeword-trainingdocker compose build trainer
docker compose run --rm trainer ./setup-data.shRecording 20-50 samples of your actual voice significantly improves detection. This runs on your host machine (needs microphone access):
pip install pyaudio numpy scipy
python record_samples.py --wake-word "hey cal"- Press ENTER to start each 2-second recording
- Say your wake word naturally
- Vary your tone, speed, and distance from the mic
- Press 'q' to quit
docker compose run --rm trainer python train.py --wake-word "hey cal" --data-dir /app/dataTraining takes 4-8 hours depending on GPU.
Test on your host machine (needs microphone access):
pip install openwakeword pyaudio numpy
python test_model.py --model my_custom_model/hey_cal.onnxSpeak your wake word into the microphone and watch for detections.
| Parameter | Default | Description |
|---|---|---|
--wake-word |
"hey cal" | The wake word/phrase to detect |
--samples-per-voice |
200 | Samples generated per Kokoro voice |
--training-steps |
50000 | More steps = better but slower |
--layer-size |
64 | Network size (32, 64, or 128) |
--kokoro-url |
http://localhost:8880 | Kokoro TTS endpoint |
--data-dir |
. |
Training data directory (/app/data for Docker) |
-
Sample Generation - Creates ~13K positive samples using 67 Kokoro voices with speed variation (0.7-1.3x), plus your real recordings (weighted 3x)
-
Negative Samples - Generates samples of clearly different phrases ("hello", "hey siri", "alexa") to teach the model what NOT to detect
-
Augmentation - OpenWakeWord adds noise, reverb, and mixing to simulate real-world conditions
-
Training - Neural network learns to distinguish your wake word from everything else
Don't use similar-sounding negatives. Training on phrases like "hey call" or "hey carl" actually hurts performance. Use only clearly different phrases like "hello", "hey siri", "alexa".
my_custom_model/
├── hey_cal.onnx # Your trained model - use this!
└── hey_cal/
├── positive_train/ # Generated training samples
├── positive_test/ # Test samples
├── negative_train/ # Negative training samples
└── negative_test/ # Negative test samples
from openwakeword.model import Model
model = Model(wakeword_models=["my_custom_model/hey_cal.onnx"])
# Process 16kHz mono audio frames
prediction = model.predict(audio_frame)
if prediction["hey_cal"] > 0.5:
print("Wake word detected!")If you prefer not to use Docker, you can set up the environment directly:
./setup.sh
source venv/bin/activate
# Start Kokoro TTS separately
docker run -d --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest
python train.py --wake-word "hey cal"Note: This requires Python 3.10+ and working CUDA. The pinned dependency versions in requirements.txt can conflict with other Python packages on your system, which is why Docker is recommended.
Normal - Kokoro's WAV headers have a quirk but the audio data is fine.
Training metrics use synthetic test samples. Real-world performance is usually better.
- Ensure audio is 16kHz mono
- Model needs ~2 seconds of audio buffer to warm up
- Try lowering detection threshold (default 0.5)
Ignore - the ONNX model is saved successfully before this error.
- OpenWakeWord by David Scripka
- Kokoro TTS for synthetic voice generation
- Training data from ACAV100M
MIT