Identify speakers by their voice using machine learning. This project provides a complete speaker recognition solution for Home Assistant, including a REST API service, Python client library, custom integration, and Home Assistant addon.
- π€ Voice-based speaker identification using neural embeddings
- π Native Home Assistant integration with STT and conversation agents
- π³ Easy deployment via Home Assistant addon or standalone Docker
- π REST API for flexible integration with any platform
- π¦ Python client library for programmatic access
- π― High accuracy powered by Resemblyzer voice embeddings
- β‘ Fast recognition with cached embeddings
- π§ Configurable via UI or YAML
The easiest way to use speaker recognition in Home Assistant:
- Add this repository to your Home Assistant addon store
- Install the Speaker Recognition addon
- Configure the addon settings:
- Host:
0.0.0.0(default) - Port:
8099(default) - Embeddings Directory:
/share/speaker_recognition/embeddings - Log Level:
info
- Host:
- Start the addon
- Install the Speaker Recognition integration via the UI
Install the client-only package (no ML dependencies):
pip install speaker-recognitionInstall with server capabilities (requires Python <3.10):
pip install speaker-recognition[server]Run the standalone service:
docker run -d \
-p 8099:8099 \
-v ./embeddings:/app/embeddings \
ghcr.io/eulemitkeule/speaker-recognition:latestTrain the system with voice samples for each speaker:
from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import TrainingRequest, VoiceSample, AudioInput
async with SpeakerRecognitionClient("http://localhost:8099") as client:
training = await client.train(
TrainingRequest(
voice_samples=[
VoiceSample(
user="Alice",
audio_input=AudioInput(
audio_data="<base64-encoded-audio>",
sample_rate=16000
)
),
VoiceSample(
user="Bob",
audio_input=AudioInput(
audio_data="<base64-encoded-audio>",
sample_rate=16000
)
)
]
)
)
print(f"Trained {training.speakers_count} speakers")curl -X POST http://localhost:8099/train \
-H "Content-Type: application/json" \
-d '{
"voice_samples": [
{
"user": "Alice",
"audio_input": {
"audio_data": "<base64-audio>",
"sample_rate": 16000
}
}
]
}'Identify a speaker from audio:
from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import RecognitionRequest, AudioInput
async with SpeakerRecognitionClient("http://localhost:8099") as client:
result = await client.recognize(
RecognitionRequest(
audio_input=AudioInput(
audio_data="<base64-encoded-audio>",
sample_rate=16000
)
)
)
print(f"Speaker: {result.speaker} (confidence: {result.confidence:.2%})")Once the integration is configured:
- Configure the backend in the main integration entry
- Map voices to users in the integration settings
- Add STT entity as a sub-entry for speech-to-text with speaker ID
- Add Conversation Agent as a sub-entry for voice commands with speaker context
The integration will automatically identify speakers and make the information available to your automations.
Health check endpoint.
Response:
{
"status": "healthy"
}Train the model with voice samples.
Request:
{
"voice_samples": [
{
"user": "string",
"audio_input": {
"audio_data": "base64-string",
"sample_rate": 16000
}
}
]
}Response:
{
"speakers_count": 2,
"message": "Training completed successfully"
}Recognize a speaker from audio.
Request:
{
"audio_input": {
"audio_data": "base64-string",
"sample_rate": 16000
}
}Response:
{
"speaker": "Alice",
"confidence": 0.95
}host: "0.0.0.0"
port: 8099
log_level: "info"
access_log: true
embeddings_dir: "/share/speaker_recognition/embeddings"HOST: Server host (default:0.0.0.0)PORT: Server port (default:8099)LOG_LEVEL: Logging level (default:info)ACCESS_LOG: Enable access logs (default:true)EMBEDDINGS_DIR: Directory for storing embeddings (default:./embeddings)
- Python 3.9 (for server development)
- Python 3.8+ (for client-only development)
- uv package manager
# Clone the repository
git clone https://github.com/eulemitkeule/speaker-recognition.git
cd speaker-recognition
# Install dependencies
uv sync --all-groups
# Run tests
uv run pytest tests/ -v
# Run linting
uv run ruff check .
# Run type checking
uv run mypy --strict speaker_recognition# Start the server
uv run python -m speaker_recognition
# Or with custom options
uv run python -m speaker_recognition --host 0.0.0.0 --port 8099speaker-recognition/
βββ speaker_recognition/ # Main package
β βββ api.py # FastAPI application
β βββ client.py # HTTP client
β βββ models.py # Pydantic models
β βββ recognizer.py # Recognition logic
βββ custom_components/ # Home Assistant integration
β βββ speaker_recognition/
βββ speaker_recognition_addon/ # Home Assistant addon
βββ tests/ # Test suite
βββ example_data/ # Example audio files
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and linting
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 style guidelines
- Use descriptive variable and function names
- Add type annotations
- Write tests for new features
- Keep methods focused and concise
This project is licensed under the MIT License - see the LICENSE file for details.
- Resemblyzer - Neural voice embeddings
- Home Assistant - Home automation platform
- FastAPI - Modern web framework
- π Report bugs
- π‘ Request features
- π Documentation
Made with β€οΈ for the Home Assistant community