A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs
| Baseline | SCM S S* | SCM F D* | SCM U D* | +TS | +compile | +FP8* |
|---|---|---|---|---|---|---|
| 24.85s | 15.4s | 11.4s | 8.2s | 8.2s | 🎉7.1s | 🎉4.5s |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Scheme: DBCache + SCM(steps_computation_mask) + TS(TaylorSeer) + FP8*, L20x1, S*: static cache,
D*: dynamic cache, S: Slow, F: Fast, U: Ultra Fast, TS: TaylorSeer, FP8*: FP8 DQ + Sage, FLUX.1-Dev
U*: Ulysses Attention, UAA: Ulysses Anything Attenton, UAA*: UAA + Gloo, Device: NVIDIA L20
FLUX.1-Dev w/o CPU Offload, 28 steps; Qwen-Image w/ CPU Offload, 50 steps; Gloo: Extra All Gather w/ Gloo
We are excited to announce that the 🎉v1.1.0 version of cache-dit has finally been released! It brings 🔥Context Parallelism and 🔥Tensor Parallelism to cache-dit, thus making it a PyTorch-native and Flexible Inference Engine for 🤗DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, SCM, Context Parallelism (w/ UAA), Tensor Parallelism and 🎉SOTA performance.
pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try
>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.- 🎉Full 🤗Diffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiTs, include 60+ models, ~100+ pipelines: 🔥FLUX, 🔥Qwen-Image, 🔥Z-image, 🔥LongCat-Image, 🔥Wan, etc.
- 🎉Extremely Easy to Use: In most cases, you only need one line of code:
cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal. - 🎉State-of-the-Art Performance: Compared with other algorithms, cache-dit achieved the SOTA w/ 7.4x↑🎉 speedup on ClipScore! Surprisingly, it's DBCache also works for extremely few-step distilled models.
- 🎉Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization, CPU or Sequential Offloading, Context Parallelism, Tensor Parallelism, etc.
- 🎉Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes. DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
- 🎉Ecosystem Integration: Joined the Diffusers community as the first DiTs' cache acceleration framework for 🤗diffusers, 🔥SGLang Diffusion, 🔥vLLM-Omni and 🔥stable-diffusion.cpp.
- 🎉HTTP Serving Support: Built-in HTTP serving capabilities for production deployment with simple REST API. Supports text-to-image, image editing, text/image-to-video, and LoRA.
Tip
One Model Series may contain many pipelines. cache-dit applies optimizations at the Transformer level; thus, any pipelines that include the supported transformer are already supported by cache-dit. ✅: supported now; ✖️: not supported now; C-P: Context Parallelism; T-P: Tensor Parallelism; TE-P: Text Encoder Parallelism; CN-P: ControlNet Parallelism; VAE-P: VAE Parallelism (TODO).
📚Supported DiTs: 🤗65+ |
Cache | C-P | T-P | TE-P | CN-P | VAE-P |
|---|---|---|---|---|---|---|
Z-Image-Turbo ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Layered | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Edit-2511-Lightning | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Edit-2511 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| LongCat-Image | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| LongCat-Image-Edit | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Z-Image-Turbo | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Z-Image-Turbo-Fun-ControlNet-2.0 | ✅ | ✅ | ✅ | ✅ | ✅ | ✖️ |
| Z-Image-Turbo-Fun-ControlNet-2.1 | ✅ | ✅ | ✅ | ✅ | ✅ | ✖️ |
| Ovis-Image | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| FLUX.2-dev | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| FLUX.1-dev | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| FLUX.1-Fill-dev | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| FLUX.1-Kontext-dev | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Edit | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Edit-2509 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-ControlNet | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-ControlNet-Inpainting | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Lightning | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Edit-Lightning | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Qwen-Image-Edit-2509-Lightning | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.2-T2V | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.2-ITV | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.2-VACE-Fun | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.1-T2V | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.1-ITV | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.1-FLF2V | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Wan-2.1-VACE | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| HunyuanImage-2.1 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| HunyuanVideo-1.5 | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| HunyuanVideo | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
FLUX.1-dev ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
FLUX.1-Fill-dev ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
Qwen-Image ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
Qwen-Image-Edit ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
Qwen-Image-Edit-2509 ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
Qwen-Image-Lightning ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
Qwen...Edit-Lightning ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
Qwen...Edit-2509-Lightning ⚡️Nunchaku |
✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
| SkyReels-V2-T2V | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| LongCat-Video | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| ChronoEdit-14B | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Kandinsky-5.0-T2V-Lite | ✅ | ✅️ | ✅️ | ✅ | ✖️ | ✖️ |
| PRX-512-t2i-sft | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| LTX-Video-v0.9.8 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| LTX-Video-v0.9.7 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| CogVideoX | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| CogVideoX-1.5 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| CogView-4 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| CogView-3-Plus | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| Chroma1-HD | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| PixArt-Sigma-XL-2-1024-MS | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| PixArt-XL-2-1024-MS | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| VisualCloze-512 | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| ConsisID-preview | ✅ | ✅ | ✅ | ✅ | ✖️ | ✖️ |
| mochi-1-preview | ✅ | ✖️ | ✅ | ✅ | ✖️ | ✖️ |
| Lumina-Image-2.0 | ✅ | ✖️ | ✅ | ✅ | ✖️ | ✖️ |
| HiDream-I1-Full | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| HunyuanDiT | ✅ | ✖️ | ✅ | ✅ | ✖️ | ✖️ |
| Sana-1600M-1024px | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| DiT-XL-2-256 | ✅ | ✅ | ✖️ | ✅ | ✖️ | ✖️ |
| Allegro-T2V | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| OmniGen-2 | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| stable-diffusion-3.5-large | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| Amused-512 | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
| AuraFlow | ✅ | ✖️ | ✖️ | ✅ | ✖️ | ✖️ |
🔥Click here to show many Image/Video cases🔥
🎉Now, cache-dit covers almost All Diffusers' DiT Pipelines🎉
🔥Qwen-Image | Qwen-Image-Edit | Qwen-Image-Edit-Plus 🔥
🔥FLUX.1 | Qwen-Image-Lightning 4/8 Steps | Wan 2.1 | Wan 2.2 🔥
🔥HunyuanImage-2.1 | HunyuanVideo | HunyuanDiT | HiDream | AuraFlow🔥
🔥CogView3Plus | CogView4 | LTXVideo | CogVideoX | CogVideoX 1.5 | ConsisID🔥
🔥Cosmos | SkyReelsV2 | VisualCloze | OmniGen 1/2 | Lumina 1/2 | PixArt🔥
🔥Chroma | Sana | Allegro | Mochi | SD 3/3.5 | Amused | ... | DiT-XL🔥
🔥Wan2.2 MoE | +cache-dit:2.0x↑🎉 | HunyuanVideo | +cache-dit:2.1x↑🎉
🔥Qwen-Image | +cache-dit:1.8x↑🎉 | FLUX.1-dev | +cache-dit:2.1x↑🎉
🔥Qwen...Lightning | +cache-dit:1.14x↑🎉 | HunyuanImage | +cache-dit:1.7x↑🎉
🔥Qwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6x↑🎉 | 1.9x↑🎉
🔥FLUX-Kontext-dev | Baseline | +cache-dit:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉
🔥HiDream-I1 | +cache-dit:1.9x↑🎉 | CogView4 | +cache-dit:1.4x↑🎉 | 1.7x↑🎉
🔥CogView3 | +cache-dit:1.5x↑🎉 | 2.0x↑🎉| Chroma1-HD | +cache-dit:1.9x↑🎉
🔥Mochi-1-preview | +cache-dit:1.8x↑🎉 | SkyReelsV2 | +cache-dit:1.6x↑🎉
🔥VisualCloze-512 | Model | Cloth | Baseline | +cache-dit:1.4x↑🎉 | 1.7x↑🎉
🔥LTX-Video-0.9.7 | +cache-dit:1.7x↑🎉 | CogVideoX1.5 | +cache-dit:2.0x↑🎉
🔥OmniGen-v1 | +cache-dit:1.5x↑🎉 | 3.3x↑🎉 | Lumina2 | +cache-dit:1.9x↑🎉
🔥Allegro | +cache-dit:1.36x↑🎉 | AuraFlow-v0.3 | +cache-dit:2.27x↑🎉
🔥Sana | +cache-dit:1.3x↑🎉 | 1.6x↑🎉| PixArt-Sigma | +cache-dit:2.3x↑🎉
🔥PixArt-Alpha | +cache-dit:1.6x↑🎉 | 1.8x↑🎉| SD 3.5 | +cache-dit:2.5x↑🎉
🔥Asumed | +cache-dit:1.1x↑🎉 | 1.2x↑🎉 | DiT-XL-256 | +cache-dit:1.8x↑🎉
- 📊Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
- 🌐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
- 🎉User Guide - For more advanced features, please refer to the 🎉User_Guide.md for details.
- ❓FAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.
- ⚙️Installation
- 🔥Supported DiTs
- 🔥Benchmarks
- 🎉Unified Cache APIs
- ⚡️DBCache: Dual Block Cache
- ⚡️DBPrune: Dynamic Block Prune
- ⚡️Hybrid Cache CFG
- 🔥Hybrid TaylorSeer Calibrator
- 🤖SCM: Steps Computation Masking
- ⚡️Hybrid Context Parallelism
- 🤖UAA: Ulysses Anything Attention
- 🤖Async Ulysses QKV Projection
- 🤖Async FP8 Ulysses Attention
- ⚡️Hybrid Tensor Parallelism
- 🤖Parallelize Text Encoder
- 🤖Low-bits Quantization
- 🤖How to use FP8 Attention
- 🛠Metrics Command Line
- ⚙️Torch Compile
- 📊Torch Profiler Usage
- 📚API Documents
How to contribute? Star ⭐️ this repo to support us or check CONTRIBUTE.md.
Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, sdnext, 🔥stable-diffusion.cpp, 🔥vLLM-Omni, and 🔥SGLang Diffusion. 🎉CacheDiT has been recommended by many famous opensource projects: 🔥Z-Image, 🔥Wan 2.2, 🔥Qwen-Image, 🔥LongCat-Video, Qwen-Image-Lightning, Kandinsky-5, LeMiCa, 🤗diffusers, HelloGitHub and GiantPandaLLM.
Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: 🤗diffusers, SGLang, ParaAttention, xDiT, TaylorSeer and LeMiCa.
@misc{cache-dit@2025,
title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
url={https://github.com/vipshop/cache-dit.git},
note={Open-source software available at https://github.com/vipshop/cache-dit.git},
author={DefTruth, vipshop.com},
year={2025}
}












