Status: 🚧 Repository under active development. We are continuously adding more data and features. More data and features are coming soon!
InfiniteDance is a comprehensive framework for scalable 3D music-to-dance generation, designed for high-quality generalization in-the-wild. It utilizes a VQ-VAE-based motion encoder to discretize dance movements and a Large Language Model (LLaMA-3.2-1B) for high-fidelity autoregressive generation. By incorporating Retrieval-Augmented Generation (RAG), InfiniteDance achieves superior style consistency and motion diversity. Our motion VQ-VAE (DanceVQVAE) follows MoMask.
- LLM-based Generation: Leverages the power of LLaMA 3.2-1B for sophisticated dance sequence synthesis.
- RAG-Enhanced Conditioning: Integrated Retrieval Network for precise style and motion guidance.
- Scalable Multimodal Architecture: Supports diverse genres (Ballet, Popular, Latin, Modern, Folk, Classic).
- Production-Ready Pipeline: From raw music features to high-quality SMPL-based video rendering.
InfiniteDance├── All_LargeDanceAR/ # Main LLM generation module
│ ├── models/ # Model architectures and wrappers
│ │ ├── checkpoints/ # VQVAE and other model weights
│ │ ├── Llama3.2-1B/ # Base LLaMA model (Download from HF)
│ │ └── WavTokenizer/ # Music encoder component
│ ├── RetrievalNet/ # Retrieval-Augmented Generation (RAG) network
│ │ └── checkpoints/ # RetrievalNet pre-trained weights
│ ├── output/ # Training outputs and fine-tuned weights
│ ├── utils/ # Token-to-SMPL conversion and utilities
│ ├── visualization/ # Rendering and video generation tools
│ ├── train_infinitedance_start.py # Main training entry point
│ ├── infer_llama_infinitedance.py # Main inference script
│ └── infer.sh # All-in-one inference script
├── DanceVQVAE/ # VQ-VAE for motion quantization (follows MoMask)
└── InfiniteDanceData/ # Dataset directory (Should be placed at root)
├── dance/ # Motion tokens (.npy)
├── music/ # Music features (.npy)
├── partition/ # Data splits (train/val/test)
└── styles/ # Style metadata
# Clone the repository
git clone git@github.com:MotrixLab/InfiniteDance.git
cd InfiniteDance
# Install dependencies
pip install -r requirements.txt
All datasets and pre-trained checkpoints are hosted on Hugging Face. After download, place them in the following locations (relative to the repo root unless you use absolute paths):
🤗 Hugging Face CheckPoints: InfiniteDance
Download the InfiniteDanceData folder and place it in the repo root:
# Path: <your path>/InfiniteDance_opensource/InfiniteDanceData
Please place the downloaded weights in their respective directories:
- VQ-VAE Weights:
All_LargeDanceAR/models/checkpoints/ - RetrievalNet Weights:
All_LargeDanceAR/RetrievalNet/checkpoints/ - InfiniteDance Fine-tuned Weights:
All_LargeDanceAR/output/exp_m2d_infinitedance/ - Base LLM: Download Llama-3.2-1B and place it in
All_LargeDanceAR/models/Llama3.2-1B/.
After placement, the expected structure looks like this:
InfiniteDance├── InfiniteDanceData/
│ ├── dance/
│ ├── music/
│ ├── partition/
│ └── styles/
└── All_LargeDanceAR/
├── models/
│ ├── checkpoints/
│ └── Llama3.2-1B/
├── RetrievalNet/
│ └── checkpoints/
└── output/
└── exp_m2d_infinitedance/
You can run the full inference pipeline (Generation → Post-processing → Visualization) using the provided shell script or by running the python scripts manually.
Edit infer.sh in All_LargeDanceAR to set your paths, then run:
cd All_LargeDanceAR
chmod +x infer.sh
./infer.sh
To generate dance tokens manually from music features:
cd All_LargeDanceAR
python infer_llama_infinitedance.py \
--music_path <your path>/InfiniteDanceData/music/muq_features/test_infinitedance \
--checkpoint_path <your path>/All_LargeDanceAR/output/exp_m2d_infinitedance/best_model_stage2.pt \
--vqvae_checkpoint_path <your path>/All_LargeDanceAR/models/checkpoints/dance_vqvae.pth \
--output_dir <your path>/All_LargeDanceAR/infer_results \
--style Popular \
--dance_length 288
Visualization Pipeline: If you ran the manual inference above, proceed to visualize the results:
# 1. Convert tokens to SMPL joints (.npy)
python ./utils/tokens2smpl.py --npy_dir ./infer_results/dance
# 2. Render joints to video (.mp4)
python ./visualization/render_plot_npy.py --joints_dir ./infer_results/dance/npy/joints
To evaluate metrics, make sure you are in All_LargeDanceAR:
cd All_LargeDanceAR
./metrics.sh <base_path> [device_id]
The training process is divided into two stages:
- Stage 1: Train the bridge module and adapters while freezing the LLM backbone.
- Stage 2: Full-parameter fine-tuning of the entire system.
cd All_LargeDanceAR
# Start Training
python train_infinitedance_start.py \
--dance_dir <your path>/InfiniteDanceData/dance/Infinite_MotionTokens_512_vel_processed \
--music_dir <your path>/InfiniteDanceData/music/muq_features \
--vqvae_checkpoint_path <your path>/All_LargeDanceAR/models/checkpoints/dance_vqvae.pth \
--llama_config_path <your path>/All_LargeDanceAR/models/Llama3.2-1B/config.json \
--world_size 4 \
--batch_size 8 \
--learning_rate1 4e-5 \
--stage1_epoch 2 \
--stage2_epoch 50
If you use this code or dataset in your research, please cite our work:
@article{infinitedance2026,
title={InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization},
author={...},
journal={arXiv},
year={2026}
}