A lightweight, zero-shot voice cloning desktop application powered by LuxTTS (150× realtime inference).
✅ Zero-Shot Voice Cloning — Clone any voice from a 3+ second reference audio sample
✅ High-Fidelity 48 kHz Output — Crystal-clear, natural-sounding speech
✅ Blazing Fast — Generates audio 150× faster than real-time on GPU
✅ Low VRAM Footprint — Runs in <1 GB VRAM; ideal for multi-tasking
✅ Modern UI — Dark-mode CustomTkinter interface with real-time controls
✅ CPU Friendly — Works on CPU too (slower, but functional)
- OS: Windows 10+ (or Linux/macOS with Python 3.10+)
- Python: 3.10 or higher
- GPU: Optional (NVIDIA CUDA 11.8+ recommended; CPU works too!)
- RAM: 8 GB+ recommended
- Disk: ~5–10 GB for model weights (auto-downloaded on first run)
- Download or clone this repository
- Double-click
run.bat - Wait for dependencies to install (first run only, 5–10 min)
- The OptiClone window will appear
# Create & activate venv
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS / Linux
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Run app
python main.py-
Load a Reference (3+ seconds of target voice)
- Click Upload WAV/MP3 to select a file, or
- Click Record to capture live microphone input
-
Enter Text
- Type the script you want to clone in the text area
-
Adjust Controls (Optional)
- Inference Steps (1–16): Higher = better quality, slower synthesis
- Speed (0.5–2.0×): Playback speed multiplier
- Guidance Scale (0.1–1.5): Higher = more adherence to reference voice
-
Generate
- Click Generate Voice
- Wait for synthesis to complete
-
Play & Export
- Click Play to preview
- Click Export WAV to save the output
-
Reference Audio:
- Minimum 3 seconds; 5–10 seconds optimal
- Use clear, noise-free recordings
- Match the target language/accent if possible
-
Inference Steps:
- Default (4) is fast & good quality
- Use 8+ steps for studio-quality output
- Use 1–2 for speed (sacrifices quality)
-
Guidance Scale:
- Default (0.9) balances quality & speaker similarity
- Higher values (1.0+) favour the reference voice
- Lower values (0.5–0.7) allow more variation
OptiClone/
├── run.bat # Windows one-click launcher
├── main.py # App entry point
├── requirements.txt # Python dependencies
├── CLAUDE.md # Technical project notes
├── README.md # This file
└── opticlone/
├── __init__.py
├── config.py # App constants & defaults
├── inference_engine.py # LuxTTS model wrapper (core logic)
├── audio_utils.py # Recording, playback, WAV I/O
└── ui_main.py # CustomTkinter GUI
| Module | Purpose |
|---|---|
inference_engine.py |
LuxTTS model loading, reference encoding, speech generation |
ui_main.py |
CustomTkinter GUI: controls, playback, file dialogs |
audio_utils.py |
Microphone recording, WAV export, playback threading |
config.py |
Constants: model paths, defaults, UI colors |
Solution: The first run downloads ~500 MB of model weights. Wait 2–5 minutes.
Check the console (bottom of the window, if open) or .opticlone.log for details.
Solution:
- Reduce Inference Steps to 1–2
- Lower Reference Duration in code (
config.py:DEFAULT_REF_DURATION = 2) - Use CPU instead by editing
inference_engine.pyto forcedevice='cpu'
Solution:
- Check system audio settings (Speaker enabled?)
- Try exporting to WAV and playing with an external player
- Verify
sounddeviceinstalled:pip show sounddevice
Solution:
- Check mic permissions (Windows: Settings → Privacy → Microphone)
- Test mic in system settings first
- Use file upload instead of recording
Solution:
pip install --upgrade setuptools wheel
pip install -r requirements.txt --force-reinstallOptiClone uses models & code from:
- LuxTTS by Yatharth Sharma
- ZipVoice by K2-FSA Team
- Vocos Vocoder by Gemelo AI
Licensed under Apache 2.0 — see LICENSE in the repo.
If you use OptiClone in research or production, please cite the original models:
@misc{sharma2025luxtts,
title={LuxTTS: A High-Quality Rapid TTS Voice Cloning Model},
author={Sharma, Yatharth and others},
year={2025},
howpublished={\url{https://github.com/ysharma3501/LuxTTS}}
}This tool is for personal, research, and authorized commercial use only. Users are responsible for ensuring compliance with local laws regarding voice cloning and synthetic media. Unauthorized voice cloning of identifiable individuals may violate privacy and identity protection laws.
Found a bug or have a feature idea? Open an issue or pull request on GitHub!
OptiClone is released under the Apache 2.0 License — same as LuxTTS.
Made with ❤️ for voice enthusiasts and ML researchers.