Minimal, cleaned version of SadTalker focused on a lightweight local workflow:
- Type a message
- Generate speech with Piper TTS
- Animate a single avatar image with SadTalker
No WebUI/Gradio, and no speech-to-text (Vosk) in this workflow.
- Local text-to-speech via Piper (
piperCLI) - High-quality talking-head animation via SadTalker
- Optional face enhancement via GFPGAN
- Simple interactive CLI + scriptable SadTalker runner (
python -m src.inference)
- Core model code under
src/(preprocess, audio2coeff, face renderer, utilities). main.py: interactive flow (type text → Piper voice → SadTalker settings → output video).src/inference.py: scriptable entrypoint for SadTalker (image + audio → mp4).- Model files under
models/:models/checkpoints/(SadTalker weights)models/gfpgan/weights/(enhancer weights)models/voices/(Piper voice models)
Optional helper folders (examples, docs, results) can be ignored or deleted in downstream projects; they are not required by the core code.
- Python 3.8–3.10 is recommended (some deps may be flaky on 3.11+).
- FFmpeg available on PATH (recommended).
- PyTorch installed manually for your platform (CPU/GPU).
This project writes videos; having FFmpeg installed system-wide makes things more reliable.
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpegFrom the project root:
python -m venv .venv
source .venv/bin/activate # or .venv\\Scripts\\activate on Windows
# Install PyTorch (version combo known to work with basicsr/gfpgan)
# CPU-only example for macOS/Linux:
pip install "torch==1.12.1" "torchvision==0.13.1" "torchaudio==0.12.1" \
--extra-index-url https://download.pytorch.org/whl/cpu
# Then install Python dependencies
pip install -r requirements.txtThis installs the non-PyTorch dependencies listed in requirements.txt. PyTorch itself is not in requirements.txt because its wheels are hardware/OS specific; if you target a different platform or want GPU builds, use the selector on https://pytorch.org to adjust the command.
You can also use the helper script to install requirements and download all SadTalker checkpoints, GFPGAN weights, and Piper voice models in one go:
python setup.pyOr run only specific steps:
python setup.py --requirements-only # just pip install -r requirements.txt
python setup.py --models-only # download model files + Piper voices
python setup.py --verify # check that files existpython main.pyWorkflow:
- Type a message and press Enter
- Choose a Piper voice (press Enter to use the default)
- Choose SadTalker settings (press Enter to use defaults)
- The result is written under
results/<uuid>/<timestamp>.mp4
python -m src.inference \
--driven_audio /path/to/audio.wav \
--source_image /path/to/image.png \
--result_dir ./results \
--save_dir ./results/run-1/2025_12_18_12.00.00 \
--size 256 \
--batch_size 1 \
--preprocess crop \
--enhancer gfpgan # optional, requires GFPGAN weightsFor backwards compatibility, python inference.py ... also works (shim).
This writes the output video into the results/ directory.
--save_dir is optional; when provided, it forces SadTalker to write all intermediate files into that exact folder.
This is used by main.py so the generated piper.wav is co-located with the run intermediates and gets cleaned up.
Defaults live in config.py and can be overridden via a .env file or changed in the file itself.
Key settings:
AVATAR_FACE(default:models/avatar.jpg)PIPER_VOICES_DIR(default:models/voices)PIPER_VOICE_MALE,PIPER_VOICE_FEMALE,PIPER_VOICE_DEFAULTSADTALKER_PREPROCESS_DEFAULT(crop|full|extfull)SADTALKER_SIZE_DEFAULT(256|512)SADTALKER_BATCH_SIZE_DEFAULT(recommended1)SADTALKER_ENHANCER_DEFAULT(empty orgfpgan)
Example .env:
AVATAR_FACE=models/avatar.jpg
PIPER_VOICE_DEFAULT=models/voices/dii_pt-PT.onnx
SADTALKER_PREPROCESS_DEFAULT=crop
SADTALKER_SIZE_DEFAULT=256
SADTALKER_ENHANCER_DEFAULT=For a Flask / chatbot or other backend service:
- Treat this repo as a local package: pip install -e . inside your main project venv.
- Mirror the logic in
src/inference.pyinside your own service layer, or wrap it in a small helper that returns the generated video path. - Ensure the models/checkpoints/ directory is available in your deployed environment and points to the correct model weights.
At runtime you generally:
- Save the user image and audio to disk.
- Call the inference script (directly, via main.py, or via a thin Python wrapper) with those paths and a result_dir you control.
- Return the resulting mp4 path or URL from your Flask API so your chatbot can display or link it.
If you install this repo into another project with pip install -e ., you can
import the modules directly (e.g. import src.utils.preprocess) and build a
generate_talking_video(...) helper in that project that mirrors what
src/inference.py does.
On Windows, inside a venv, you can run:
python -m venv .venv
.\.venv\Scripts\activate
# Install PyTorch for your CPU/GPU from pytorch.org
pip install "torch==1.12.1" "torchvision==0.13.1" "torchaudio==0.12.1" `
--extra-index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
python setup.py --models-only
python main.pyIf you see compiler errors when installing basicsr or similar packages, install Build Tools for Visual Studio first so you have a working C/C++ toolchain.
- If basicsr or gfpgan fail to install with wheel warnings, upgrade pip/setuptools/wheel in your venv and retry.
- If you see ModuleNotFoundError for torchvision.transforms.functional_tensor, pin torch/torchvision to 1.12.1 / 0.13.1 as in the README install example.
- On Apple Silicon (M1/M2), use a PyTorch build with MPS support so the heavy steps can run on the GPU instead of CPU.
- If
piperis not found, ensure your venv is activated andpiper-ttsis installed; the CLI should be available aspiper.