Skip to content

nighthack/OpenAvatarChat

 
 

Repository files navigation

Open Avatar Chat

English

A modular interactive digital-human chat implementation that can run full functionality on a single PC.

🤗 Demo on Hugging Face  |  

🔥 Key Features

  • Low-latency real-time digital-human chat: average response latency around 2.2 seconds.
  • Multimodal language model support: text, audio, video, etc.
  • Modular design: flexible component replacement for different feature combinations.

📢 What's New

Changelog

  • [2025-06-12] ⭐️⭐️⭐️ Version 0.4.1 released:

    • Added support for MuseTalk digital humans with customizable avatars (custom base videos).
    • Released 50 new LiteAvatar models; see LiteAvatarGallery.
  • [2025-04-18] ⭐️⭐️⭐️ Version 0.3.0 released:

    • 🎉 Congratulations on the acceptance of the LAM paper at SIGGRAPH 2025! 🎉
    • Added support for LAM digital humans (single-image, second-level generation of ultra-realistic 3D avatars).
    • Added Builian API TTS handler to greatly reduce GPU dependency.
    • Added Microsoft Edge TTS support.
    • Switched to uv for Python package management; install dependencies per activated handler.
    • Updated CSS responsive layout.
  • [2025-04-14] ⭐️⭐️⭐️ Version 0.2.2 released:

    • Released 100 new LiteAvatar models; see LiteAvatarGallery.
    • Default GPU backend for lite-avatar digital humans.
  • [2025-04-07] ⭐️⭐️⭐️ Version 0.2.1 released:

    • Added history support.
    • Added text input support.
    • No longer requires a camera at startup.
    • Optimized modular loading.
  • [2025-02-20] ⭐️⭐️⭐️ Version 0.1.0 released:

    • Modular real-time interactive digital humans.
    • Support for MiniCPM-o as a multimodal language model, or cloud-based API for ASR + LLM + TTS.

To-do List

  • Reach 100 preconfigured digital-human models.
  • Integrate LAM.
  • Integrate Qwen2.5-Omni.

Demo

Online Demo

We have deployed demo services on:

Audio is powered by SenseVoice + Qwen-VL + CosyVoice, with switchable support for LiteAvatar and LAM digital humans. Feel free to try it out!

Video Showcase

LiteAvatar

OpenAvatarChat_Demo.mp4

LAM

OpenAvatarChat_LAM_Demo.mp4

Community

  • WeChat Group

WeChat QR Code

🚨 FAQ

Common issues encountered during development can be found in the FAQ.

📖 Table of Contents

Overview

Introduction

Open Avatar Chat is a modular interactive digital-human chat implementation that can run full functionality on a single PC. It currently supports MiniCPM-o as a multimodal language model, or can use cloud-based APIs to replace ASR + LLM + TTS. The architectures for both modes are shown below. More preset modes are listed in Preset Modes.

Data Flow

System Requirements

  • Python ≥ 3.11.7, < 3.12
  • CUDA-capable GPU
  • Unquantized MiniCPM-o requires ≥ 20GB VRAM
  • Digital-human rendering can use GPU/CPU (tested on i9-13980HX CPU achieving 30 FPS)

Tip: An int4-quantized LM can run on <10GB VRAM with reduced performance. Using cloud-based APIs for ASR + LLM + TTS greatly lowers hardware requirements; see ASR + LLM + TTS mode.

Performance Metrics

On a PC with an i9-13900KF and an NVIDIA RTX 4090, we measured response latency over ten runs: average ~2.2s. Latency is measured from end of user speech to start of avatar speech, including RTC round-trip, VAD end delay, and computation time.

Component Dependencies

Type Open Source Project GitHub Link Model Link
RTC HumanAIGC-Engineering/gradio-webrtc GitHub
VAD snakers4/silero-vad GitHub
LLM OpenBMB/MiniCPM-o GitHub 🤗,
LLM-int4 OpenBMB/MiniCPM-o GitHub 🤗,
Avatar HumanAIGC/lite-avatar GitHub
TTS FunAudioLLM/CosyVoice GitHub
Avatar aigc3d/LAM_Audio2Expression GitHub 🤗
ASR facebook/wav2vec2-base-960h 🤗,
Avatar TMElyralab/MuseTalk GitHub

Preset Modes

CONFIG Name ASR LLM TTS Avatar
chat_with_gs.yaml SenseVoice Cloud API Cloud API LAM
chat_with_minicpm.yaml MiniCPM-o MiniCPM-o MiniCPM-o lite-avatar
chat_with_openai_compatible.yaml SenseVoice Cloud API CosyVoice lite-avatar
chat_with_openai_compatible_edge_tts.yaml SenseVoice Cloud API edge-tts lite-avatar
chat_with_openai_compatible_bailian_cosyvoice.yaml SenseVoice Cloud API Cloud API lite-avatar
chat_with_openai_compatible_bailian_cosyvoice_musetalk.yaml SenseVoice Cloud API Cloud API MuseTalk

🚀 Installation & Deployment

Refer to each mode's handler installation guide and related deployment requirements before installation.

Choosing a Configuration

OpenAvatarChat is launched via a config file in config/. Available presets:

chat_with_gs.yaml

Uses LAM for edge-side rendering and Bailian CosyVoice for TTS, with only VAD & ASR locally on GPU—light hardware footprint

Handlers Used
Category Handler Installation Guide
Client client/h5_rendering_client/client_handler_lam LAM Edge Rendering Client Handler
VAD vad/silerovad/vad_handler/silero
ASR asr/sensevoice/asr_handler_sensevoice
LLM llm/openai_compatible/llm_handler_openai_compatible OpenAI-Compatible LM Handler
TTS tts/bailian_tts/tts_handler_cosyvoice_bailian Bailian CosyVoice Handler
Avatar avatar/lam/avatar_handler_lam_audio2expression LAM Audio2Expression Handler

... and similarly for other config files (see full list above).

Local Run

Important: This project uses Git submodules and large models via Git LFS. Ensure Git LFS is installed and submodules are updated:

sudo apt install git-lfs
git lfs install
git submodule update --init --recursive

We recommend cloning rather than downloading ZIPs for submodule and LFS support. If issues arise, please file an issue.

uv Installation

Install uv for environment management:

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or via pip
pip install uv
# Or via pipx
pipx install uv

Dependency Installation

Install All Dependencies
uv sync --all-packages
Install Only Required Dependencies for a Mode
uv venv --python 3.11.11
uv pip install setuptools pip
uv run install.py --uv --config <absolute-path-to-config>.yaml
./scripts/post_config_install.sh --config <absolute-path-to-config>.yaml

Note: post_config_install.sh adds NVIDIA CUDA library paths to ld.so.conf.d and updates the cache via ldconfig.

Run

uv run src/demo.py --config <absolute-path-to-config>.yaml

Docker Run

Containerized deployment (requires NVIDIA Docker support):

./build_and_run.sh --config <absolute-path-to-config>.yaml

Handler Dependency Guides

RTC Client Handler (Server-side Rendering)

No special dependencies or setup required.

LAM Client Handler (Edge Rendering)

Extends the server-side RTC handler for multi-stream support; select avatar via config (see sample config under scripts/).

OpenAI-Compatible Language Model Handler

Use any LLM API key; configure model, system prompt, API URL, and key in config or via env vars.

LLM_Bailian:
  model_name: "qwen-plus"
  system_prompt: "You are an AI digital human..."
  api_url: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
  api_key: 'YOUR_API_KEY' # or env var

...and so on for each handler (MiniCPM, CosyVoice, Edge TTS, LiteAvatar, LAM Audio2Exp, MuseTalk).

Related Deployment Requirements

SSL Certificates

For non-localhost RTC connections, provide localhost.crt and localhost.key in ssl_certs/, or generate via:

scripts/create_ssl_certs.sh

TURN Server

If clients stall on "waiting for connection", set up a TURN server (e.g., coturn); see scripts/setup_coturn.sh and update config under RtcClient.turn_config.

Configuration Guide

By default, config/chat_with_minicpm.yaml is used. Override with:

uv run src/demo.py --config <absolute-path-to-config>.yaml

Acknowledgements

  • Thanks to “十字鱼” for the one-click installer video on Bilibili and provided package.
  • Thanks to “W&H” for Quark one-click packages (Windows and Linux).

Star History

Citation

If OpenAvatarChat helps in your research or projects, please give us a ⭐ and cite:

@software{avatarchat2025,
  author = {Gang Cheng, Tao Chen, Feng Wang, Binchao Huang, Hui Xu, Guanqiao He, Yi Lu, Shengyin Tan},
  title = {OpenAvatarChat},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/HumanAIGC-Engineering/OpenAvatarChat}
}

About

Light open Avatar chat - CPU run

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.5%
  • Shell 3.0%
  • Dockerfile 0.5%