English
A modular interactive digital-human chat implementation that can run full functionality on a single PC.
- Low-latency real-time digital-human chat: average response latency around 2.2 seconds.
- Multimodal language model support: text, audio, video, etc.
- Modular design: flexible component replacement for different feature combinations.
-
[2025-06-12] ⭐️⭐️⭐️ Version 0.4.1 released:
- Added support for MuseTalk digital humans with customizable avatars (custom base videos).
- Released 50 new LiteAvatar models; see LiteAvatarGallery.
-
[2025-04-18] ⭐️⭐️⭐️ Version 0.3.0 released:
- 🎉 Congratulations on the acceptance of the LAM paper at SIGGRAPH 2025! 🎉
- Added support for LAM digital humans (single-image, second-level generation of ultra-realistic 3D avatars).
- Added Builian API TTS handler to greatly reduce GPU dependency.
- Added Microsoft Edge TTS support.
- Switched to
uvfor Python package management; install dependencies per activated handler. - Updated CSS responsive layout.
-
[2025-04-14] ⭐️⭐️⭐️ Version 0.2.2 released:
- Released 100 new LiteAvatar models; see LiteAvatarGallery.
- Default GPU backend for
lite-avatardigital humans.
-
[2025-04-07] ⭐️⭐️⭐️ Version 0.2.1 released:
- Added history support.
- Added text input support.
- No longer requires a camera at startup.
- Optimized modular loading.
-
[2025-02-20] ⭐️⭐️⭐️ Version 0.1.0 released:
- Modular real-time interactive digital humans.
- Support for MiniCPM-o as a multimodal language model, or cloud-based API for ASR + LLM + TTS.
- Reach 100 preconfigured digital-human models.
- Integrate LAM.
- Integrate Qwen2.5-Omni.
We have deployed demo services on:
Audio is powered by SenseVoice + Qwen-VL + CosyVoice, with switchable support for LiteAvatar and LAM digital humans. Feel free to try it out!
OpenAvatarChat_Demo.mp4 |
OpenAvatarChat_LAM_Demo.mp4 |
- WeChat Group
Common issues encountered during development can be found in the FAQ.
-
-
-
- RTC Client Handler (Server-side Rendering)
- LAM Client Handler (Edge Rendering)
- OpenAI-Compatible Language Model Handler
- MiniCPM Multimodal Language Model Handler
- Bailian CosyVoice TTS Handler
- CosyVoice Local Inference Handler
- Edge TTS Handler
- LiteAvatar Handler
- LAM Audio2Expression Handler
- MuseTalk Handler
Open Avatar Chat is a modular interactive digital-human chat implementation that can run full functionality on a single PC. It currently supports MiniCPM-o as a multimodal language model, or can use cloud-based APIs to replace ASR + LLM + TTS. The architectures for both modes are shown below. More preset modes are listed in Preset Modes.
- Python ≥ 3.11.7, < 3.12
- CUDA-capable GPU
- Unquantized MiniCPM-o requires ≥ 20GB VRAM
- Digital-human rendering can use GPU/CPU (tested on i9-13980HX CPU achieving 30 FPS)
Tip: An int4-quantized LM can run on <10GB VRAM with reduced performance. Using cloud-based APIs for ASR + LLM + TTS greatly lowers hardware requirements; see ASR + LLM + TTS mode.
On a PC with an i9-13900KF and an NVIDIA RTX 4090, we measured response latency over ten runs: average ~2.2s. Latency is measured from end of user speech to start of avatar speech, including RTC round-trip, VAD end delay, and computation time.
| Type | Open Source Project | GitHub Link | Model Link |
|---|---|---|---|
| RTC | HumanAIGC-Engineering/gradio-webrtc | GitHub | |
| VAD | snakers4/silero-vad | GitHub | |
| LLM | OpenBMB/MiniCPM-o | GitHub | 🤗, |
| LLM-int4 | OpenBMB/MiniCPM-o | GitHub | 🤗, |
| Avatar | HumanAIGC/lite-avatar | GitHub | |
| TTS | FunAudioLLM/CosyVoice | GitHub | |
| Avatar | aigc3d/LAM_Audio2Expression | GitHub | 🤗 |
| ASR | facebook/wav2vec2-base-960h | 🤗, |
|
| Avatar | TMElyralab/MuseTalk | GitHub |
| CONFIG Name | ASR | LLM | TTS | Avatar |
|---|---|---|---|---|
| chat_with_gs.yaml | SenseVoice | Cloud API | Cloud API | LAM |
| chat_with_minicpm.yaml | MiniCPM-o | MiniCPM-o | MiniCPM-o | lite-avatar |
| chat_with_openai_compatible.yaml | SenseVoice | Cloud API | CosyVoice | lite-avatar |
| chat_with_openai_compatible_edge_tts.yaml | SenseVoice | Cloud API | edge-tts | lite-avatar |
| chat_with_openai_compatible_bailian_cosyvoice.yaml | SenseVoice | Cloud API | Cloud API | lite-avatar |
| chat_with_openai_compatible_bailian_cosyvoice_musetalk.yaml | SenseVoice | Cloud API | Cloud API | MuseTalk |
Refer to each mode's handler installation guide and related deployment requirements before installation.
OpenAvatarChat is launched via a config file in config/. Available presets:
Uses LAM for edge-side rendering and Bailian CosyVoice for TTS, with only VAD & ASR locally on GPU—light hardware footprint
| Category | Handler | Installation Guide |
|---|---|---|
| Client | client/h5_rendering_client/client_handler_lam |
LAM Edge Rendering Client Handler |
| VAD | vad/silerovad/vad_handler/silero |
|
| ASR | asr/sensevoice/asr_handler_sensevoice |
|
| LLM | llm/openai_compatible/llm_handler_openai_compatible |
OpenAI-Compatible LM Handler |
| TTS | tts/bailian_tts/tts_handler_cosyvoice_bailian |
Bailian CosyVoice Handler |
| Avatar | avatar/lam/avatar_handler_lam_audio2expression |
LAM Audio2Expression Handler |
... and similarly for other config files (see full list above).
Important: This project uses Git submodules and large models via Git LFS. Ensure Git LFS is installed and submodules are updated:
sudo apt install git-lfs git lfs install git submodule update --init --recursiveWe recommend cloning rather than downloading ZIPs for submodule and LFS support. If issues arise, please file an issue.
Install uv for environment management:
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or via pip
pip install uv
# Or via pipx
pipx install uvuv sync --all-packagesuv venv --python 3.11.11
uv pip install setuptools pip
uv run install.py --uv --config <absolute-path-to-config>.yaml
./scripts/post_config_install.sh --config <absolute-path-to-config>.yamlNote:
post_config_install.shadds NVIDIA CUDA library paths told.so.conf.dand updates the cache vialdconfig.
uv run src/demo.py --config <absolute-path-to-config>.yamlContainerized deployment (requires NVIDIA Docker support):
./build_and_run.sh --config <absolute-path-to-config>.yamlNo special dependencies or setup required.
Extends the server-side RTC handler for multi-stream support; select avatar via config (see sample config under scripts/).
Use any LLM API key; configure model, system prompt, API URL, and key in config or via env vars.
LLM_Bailian:
model_name: "qwen-plus"
system_prompt: "You are an AI digital human..."
api_url: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
api_key: 'YOUR_API_KEY' # or env var...and so on for each handler (MiniCPM, CosyVoice, Edge TTS, LiteAvatar, LAM Audio2Exp, MuseTalk).
For non-localhost RTC connections, provide localhost.crt and localhost.key in ssl_certs/, or generate via:
scripts/create_ssl_certs.shIf clients stall on "waiting for connection", set up a TURN server (e.g., coturn); see scripts/setup_coturn.sh and update config under RtcClient.turn_config.
By default, config/chat_with_minicpm.yaml is used. Override with:
uv run src/demo.py --config <absolute-path-to-config>.yaml- Thanks to “十字鱼” for the one-click installer video on Bilibili and provided package.
- Thanks to “W&H” for Quark one-click packages (Windows and Linux).
If OpenAvatarChat helps in your research or projects, please give us a ⭐ and cite:
@software{avatarchat2025,
author = {Gang Cheng, Tao Chen, Feng Wang, Binchao Huang, Hui Xu, Guanqiao He, Yi Lu, Shengyin Tan},
title = {OpenAvatarChat},
year = {2025},
publisher = {GitHub},
url = {https://github.com/HumanAIGC-Engineering/OpenAvatarChat}
}