d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation 🚀

We introduce a novel recipe for building an ultra-fast diffusion language model named d3LLM (pseuDo-Distilled Diffusion LLM) 🚀.

📣 News

[2025/12/11]: We release the models on Huggingface 🤗, see our d3LLM-LLaDA, d3LLM-Dream, and d3LLM-Dream-Coder.
[2025/12/11]: We release the training scripts, training datasets, and evaluation code for d3LLM, see our GitHub repo.
[2025/12/10]: We release the 🌐 blog.

✨ Demo

Demo of d3LLM: Achieve up to 5× speedup over autoregressive models (Qwen-2.5-7B-it) on H100 GPU and 3.6× speedup on A100 GPU. You can try 🕹️ our demo.

📖 What is d3LLM?

d3LLM (pseuDo-Distilled Diffusion LLM) is a novel framework for building ultra-fast diffusion language models with negligible accuracy degradation. d3LLM achieves 5× speedup over autoregressive models on H100 GPUs while maintaining competitive performance.

🎯 Getting Started

Installation

We recommend creating a dedicated ~/Codes directory to maintain consistent paths during evaluation:

# Create workspace directory
mkdir -p ~/Codes
cd ~/Codes

# Clone the repository
git clone https://github.com/hao-ai-lab/d3LLM.git
cd d3LLM

# Install dependencies
# It is important to check the version of transformers==4.49.0, lm_eval==0.4.9, datasets==3.2.0, and flash_attn==2.7.4.post1
pip install -r requirements.txt

Note: We recommend cloning in ~/Codes/d3LLM, which ensures eval_scripts work out-of-the-box with consistent paths.

Try d3LLM Instantly

Chat with d3LLM models using our simple chat scripts:

# Chat with d3LLM-Dream
python chat/chat_d3llm_dream.py

# Or chat with d3LLM-LLaDA
python chat/chat_d3llm_llada.py

Note that because our distillation data primarily consists of coding and math reasoning tasks, acceleration may only appear on prompts of these tasks.

🔬 How d3LLM Works

The d3LLM framework combines two key innovations:

(i) Pseudo-Trajectory Distillation 📚

Instead of random masking, we extract the teacher model's decoding order—the sequence in which it unmasks tokens. This pseudo-trajectory guides the student model to learn efficient generation patterns.

Pseudo-Trajectory Extraction → 15% TPF improvement
Progressive Noise Schedule → Additional 18% TPF boost
Progressive Window Sizing → Another 8% TPF gain

Our pseudo-trajectory-based distillation

(ii) Multi-Block Decoding Strategy ⚡

We enable parallel decoding across multiple blocks simultaneously using entropy-based token selection.

Entropy-Based Multi-Block Decoding → 20% TPF improvement
KV-Cache with Periodic Refresh → 20% TPS boost in long contexts
Early Stopping on EOS → 5% TPF gain

Entropy-based multi-block decoding with KV-cache and refresh.

Together, these innovations achieve 5-10× speedup on TPF (tokens per forward) over vanilla diffusion models while maintaining accuracy. Based on the d3LLM framework, we have released three models on 🤗 HuggingFace: d3LLM-LLaDA, d3LLM-Dream, and d3LLM-Coder.

🏋️‍♀️ Training d3LLM Models

We provide the training scripts for d3LLM-Dream and d3LLM-LLaDA. You can use the following commands to train the models.

# Training d3LLM-Dream
deepspeed --num_gpus=4 d3llm/d3llm_DREAM/distill_2_training/d3llm_dream_train.py

# Training d3LLM-LLaDA
deepspeed --num_gpus=4 d3llm/d3llm_LLaDA/distill_2_training/d3llm_llada_train.py

The trajectory dataset is already extracted and uploaded to HuggingFace (see Dream Trajectory and LLaDA Trajectory). You can also generate the pseudo-trajectory dataset using the script in distill_1_data_prepare/ folder.

🧪 Evaluation on Standard Benchmarks

All evaluation scripts are in the eval_scripts/ folder—just install the environment and run! We include comprehensive evaluation codes for:

✅ d3LLM (our method)
✅ AR Model (e.g., Qwen-2.5-7B-it) - Autoregressive baselines
✅ Vanilla LLaDA - Original LLaDA model
✅ Vanilla Dream - Original Dream model
✅ Fast-dLLM - Training-free acceleration with KV cache
✅ D2F - Discrete diffusion forcing
✅ dParallel - Distilled dLLMs
✅ Fast-dLLM v2 - Block-wise diffusion

See eval_scripts for more details.

📊 Benchmark Results

Our d3LLM achieves the highest AUP (Accuracy Under Parallelism) scores across multiple dLLMs and tasks:

LLaDA-based Models

Dream-based Models

Coder Models

Radar plots comparing AUP scores across different methods and benchmarks

Acceleration Highlights (on GSM8K-CoT Dataset)

Model	H100's TPS	A100's TPS	Speedup vs. AR
Qwen-2.5-7B (AR)	57.32	50.36	1.00×
d3LLM-LLaDA	288.89	183.33	3.47×~5.04×
d3LLM-Dream	235.34	128.19	2.55×~4.67×

Want more details? Check out our dLLM leaderboard and comprehensive results at 🌐 this blog.

🏆 Diffusion LLM Leaderboard

We further present a leaderboard that compares different diffusion LLMs across five representative benchmark tasks, using the AUP score (Accuracy Under Parallelism) as the primary evaluation metric, which is a hardware-independent metric that measures both the efficiency and the performance of a dLLM. More details can be found in AUP_leaderboard and 🌐 this blog.

📝 Citation

If you find d3LLM useful for your research, please star our project and cite our work.

@article{preprint'25:d3llm,
  author  = {Yu-Yang Qian and Junda Su and Lanxiang Hu and Peiyuan Zhang and Zhijie Deng and Peng Zhao and Hao Zhang},
  title   = {d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation},
  journal = {ArXiv preprint},
  volume  = {to appear},
  note    = {\url{https://github.com/hao-ai-lab/d3LLM} [Accessed: 2025-12-11]},
  year    = {2025}
}

The paper about d3LLM and AUP is coming soon. Please stay tuned!

🙏 Acknowledgments

This project builds upon excellent open-source work:

LLaDA - Large Language Diffusion Models
Dream - Diffusion Large Language Models
Fast-dLLM - Training-free acceleration
D2F - Discrete diffusion forcing
dParallel - Distilled dLLMs
lm-evaluation-harness - Evaluation framework

⭐ Star us on GitHub and cite our paper if you find this project helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
AUP_leaderboard		AUP_leaderboard
asset/imgs		asset/imgs
baseline		baseline
chat		chat
d3llm		d3llm
eval_scripts		eval_scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation 🚀

📣 News

✨ Demo

📖 What is d3LLM?

🎯 Getting Started

Installation

Try d3LLM Instantly

🔬 How d3LLM Works

(i) Pseudo-Trajectory Distillation 📚

(ii) Multi-Block Decoding Strategy ⚡

🏋️‍♀️ Training d3LLM Models

🧪 Evaluation on Standard Benchmarks

📊 Benchmark Results

Acceleration Highlights (on GSM8K-CoT Dataset)

🏆 Diffusion LLM Leaderboard

📝 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

hao-ai-lab/d3LLM

Folders and files

Latest commit

History

Repository files navigation

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation 🚀

📣 News

✨ Demo

📖 What is d3LLM?

🎯 Getting Started

Installation

Try d3LLM Instantly

🔬 How d3LLM Works

(i) Pseudo-Trajectory Distillation 📚

(ii) Multi-Block Decoding Strategy ⚡

🏋️‍♀️ Training d3LLM Models

🧪 Evaluation on Standard Benchmarks

📊 Benchmark Results

Acceleration Highlights (on GSM8K-CoT Dataset)

🏆 Diffusion LLM Leaderboard

📝 Citation

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages