Skip to content

Local STT/TTS plugins for LiveKit Agents - no cloud APIs required

License

Notifications You must be signed in to change notification settings

CoreWorxLab/local-livekit-plugins

Repository files navigation

Local LiveKit Plugins

License: MIT Python 3.10+ uv LiveKit

Run LiveKit voice agents with fully local STT and TTS - no cloud APIs required.

Custom plugins for LiveKit Agents that enable completely local speech processing using FasterWhisper for STT and Piper for TTS.

Tested On

  • OS: Linux (Arch 6.17.7)
  • GPU: NVIDIA RTX 3060 (12GB VRAM)
  • CUDA: 12.x
  • Python: 3.10+

Windows/Mac users: Not yet tested. Community contributions welcome! Please report issues on GitHub.

Why Local?

Cloud Local
Quality Better Good
Latency ~2.1s total ~2.8s total
Cost ~$150/year* ~$20/year
Privacy Data sent externally Stays on your network
Control Vendor dependent Full ownership

*Based on 100 hours/year: Deepgram Nova-2 ($0.35/hr) + Cartesia Sonic ($50/1M chars).

Features

  • FasterWhisperSTT - GPU-accelerated speech-to-text

    • Multiple model sizes (tiny → large-v3)
    • ~230-460ms processing time on RTX 3060
    • Configurable beam search and VAD
  • PiperTTS - Fast local text-to-speech

    • Multiple voice models available
    • ~9ms per character (~300ms for short responses)
    • Configurable speed, volume, pitch

Quick Start

Prerequisites

Required:

  • uv (recommended) or pip
  • Python 3.10+
  • Docker (for LiveKit server)
  • Ollama (for local LLM) - Must be running: ollama serve
  • ffmpeg or libavcodec (for audio processing)

For GPU Acceleration (recommended):

  • NVIDIA GPU with 4GB+ VRAM (8GB+ recommended for larger Whisper models)
  • NVIDIA drivers with CUDA 11.8+ support
  • Note: PyTorch (~2GB download) includes bundled CUDA libraries

1. Clone and Install

git clone https://github.com/CoreWorxLab/local-livekit-plugins.git
cd local-livekit-plugins

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e ".[examples]"

2. Download a Piper Voice Model

mkdir -p models/piper && cd models/piper

# Download Ryan (male US English, high quality)
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/high/en_US-ryan-high.onnx
curl -LO https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/ryan/high/en_US-ryan-high.onnx.json

More voices: Piper Voices on HuggingFace

3. Start LiveKit Server

docker compose up -d

# Verify it's running
curl http://localhost:7880

4. Configure and Run

# Setup environment file
cp examples/.env.local.example examples/.env.local
# Edit examples/.env.local with your model paths

# Run with local pipeline (console mode for testing)
uv run examples/voice_agent.py console

# Or run with cloud pipeline (requires API keys)
USE_LOCAL=false uv run examples/voice_agent.py console

# For dev mode (connects to LiveKit playground)
uv run examples/voice_agent.py dev

Using the Plugins in Your Own Project

Install from GitHub

uv add git+https://github.com/CoreWorxLab/local-livekit-plugins.git

Use in Your Agent

from local_livekit_plugins import FasterWhisperSTT, PiperTTS
from livekit.agents import AgentSession
from livekit.plugins import silero, openai as lk_openai

# Create local STT
stt = FasterWhisperSTT(
    model_size="medium",      # tiny, base, small, medium, large-v3
    device="cuda",            # cuda or cpu
    compute_type="float16",   # float16, int8
)

# Create local TTS
tts = PiperTTS(
    model_path="/path/to/en_US-ryan-high.onnx",
    use_cuda=False,           # CPU recommended for compatibility
    speed=1.0,
)

# Create session with local LLM (Ollama)
session = AgentSession(
    stt=stt,
    llm=lk_openai.LLM.with_ollama(model="llama3.1:8b"),
    tts=tts,
    vad=silero.VAD.load(),
)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Your Application                       │
├─────────────────────────────────────────────────────────────┤
│                    LiveKit Agents SDK                       │
├───────────────────┬──────────────────┬──────────────────────┤
│  FasterWhisperSTT │      LLM        │      PiperTTS         │
│  (this package)   │   (Ollama)      │   (this package)      │
├───────────────────┼─────────────────┼───────────────────────┤
│  faster-whisper   │   ollama        │    piper-tts          │
│     + CUDA        │                 │   + onnxruntime       │
└───────────────────┴─────────────────┴───────────────────────┘

Configuration Reference

FasterWhisperSTT

Parameter Type Default Description
model_size str "medium" Model size: tiny, base, small, medium, large-v3
device str "cuda" Processing device: cuda, cpu, auto
compute_type str "float16" Quantization: float16, int8, int8_float16
language str "en" Language hint for recognition
beam_size int 5 Beam search width (1-10)
vad_filter bool True Enable voice activity detection

PiperTTS

Parameter Type Default Description
model_path str required Path to .onnx voice model
use_cuda bool False Enable GPU (has CUDA version constraints)
speed float 1.0 Speech rate (0.5-2.0)
volume float 1.0 Volume level
noise_scale float 0.667 Phoneme variation
noise_w float 0.8 Phoneme width variation

Development

# Clone the repo
git clone https://github.com/CoreWorxLab/local-livekit-plugins.git
cd local-livekit-plugins

# Install with dev dependencies
uv sync --all-extras

# Run linter
uv run ruff check src/

# Run type checker
uv run mypy src/

# Run tests
uv run pytest

Performance

Tested on RTX 3060 12GB:

Component Metric Value
STT (whisper-medium) Processing time ~230-460ms
STT (whisper-medium) End-to-end (transcript_delay) ~760ms avg
TTS (piper ryan-high) Per character ~9ms
TTS (piper ryan-high) Short response (30 chars) ~270ms
TTS (piper ryan-high) Long response (130 chars) ~1200ms

Note: End-to-end latency includes VAD buffering. Local STT uses batch processing (waits for speech to end), while cloud STT streams in real-time.

Known Issues

Installation & Platform

  1. Large Download Size: PyTorch with CUDA support is ~2GB. First install may take a while depending on your connection.

  2. Windows/Mac Untested: This has only been tested on Linux. You may encounter:

    • Path handling issues (especially Windows)
    • Platform-specific audio library requirements
    • Different PyTorch wheel availability
    • Help wanted! If you get it working, please share your setup in a GitHub issue or PR.
  3. Whisper Model Downloads: Models download automatically on first run (1-5GB depending on size). This happens silently - check logs if first startup seems slow.

Technical Limitations

  1. Piper GPU Support: Piper has CUDA version constraints. If you're on CUDA 12+, you may need to:

    • Use CPU mode (use_cuda=False) - often faster anyway for short utterances
    • Containerize with a compatible CUDA version
    • See: Piper CUDA 12 Discussion
  2. No Streaming STT: FasterWhisper uses batch processing - it waits for speech to end before transcribing. Cloud services like Deepgram stream audio in real-time, giving them a ~300ms latency advantage. This is a fundamental limitation of Whisper-based solutions.

  3. Quality vs Cloud: Local models are good but not as polished as cloud services like Deepgram or ElevenLabs.

Getting Help

Community Support: This project is community-supported. For issues:

  • Check existing GitHub issues
  • Search for similar problems (especially platform-specific)
  • Open a new issue with your system info (OS, GPU, Python version, error logs)

Requirements

  • Python 3.10+
  • uv (recommended) or pip
  • NVIDIA GPU with CUDA support (optional, for GPU acceleration)
    • PyTorch is included as a dependency (~850MB) and provides bundled CUDA 12 libraries
    • No separate CUDA toolkit installation required
  • ~2-4GB VRAM for Whisper medium model
  • ~500MB disk per Piper voice model

Related Projects

Contributing

Contributions are welcome! This project is in active development and we'd love your help.

High Priority:

  • Platform testing: Windows and Mac users - try it out and report what works/breaks
  • GPU compatibility: Test with different CUDA versions, AMD GPUs, or CPU-only setups
  • Documentation: Improve setup instructions, add troubleshooting tips

Also Welcome:

  • Bug fixes and performance improvements
  • New features (open an issue first to discuss)
  • Better error messages and logging
  • Example projects and use cases

See CONTRIBUTING.md for guidelines, or just open an issue to start a discussion!

License

MIT License - see LICENSE for details.


Built with Claude Code

About

Local STT/TTS plugins for LiveKit Agents - no cloud APIs required

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages