cudnn Frontend v1.18.0-release #201

Anerudhan · 2026-01-27T18:46:46Z

cuDNN Frontend v1.18.0 Release Notes

cuDNN Frontend v1.18.0 is the recommended version for cuDNN 9.18.1 and later releases.

General Improvements 🚀

Move away from internally using the v0.x API. Rather, now the cudnn backend API is directly called.
Improve the execution overhead by caching repeated graph query.

Open-Source Kernels

New open source kernel for Grouped Gemm and Swiglu fussion

Grouped GEMM + SwiGLU

Enhancements ✨

Scaled Dot-Product Attention (SDPA)

New Features: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths.
Support Surface:
- Now allows deterministic bprop for SDPA
- Added support for bprop for ragged tensors in A100
More samples:
- Open sourcing our sdpa test harness. Showcase additional testing for determinism, fp8 sizes for MLA
- Added samples to showcase chunked prefill.

Mixture of Expers (MoE)

New API: Added support for moe_grouped_matmul. See cpp sample and documentation for API reference.

Matmul

More samples: Open sourcing cudnn`s fuzzy testing of matmuls

Convolution

More samples: Open sourcing cudnn`s fuzzy testing of convolutions

Additional Improvements

Benchmarking 📊

Updated the benchmark results for the sdpa improvements added in cuDNN 9.18.1

…ttps://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-18-1) and later releases. - Move away from internally using the v0.x API. Rather, now the cudnn backend API is directly called. - Improve the execution overhead by caching repeated graph query. New open source kernel for Grouped Gemm and Swiglu fussion - [Grouped GEMM + SwiGLU](gemm_fusions/grouped_gemm_swiglu.md) - **New Features**: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths. - **Support Surface**: - Now allows deterministic bprop for SDPA - Added support for bprop for ragged tensors in A100 - **More samples**: - Open sourcing our sdpa [test harness](test/python/test_mhas_v2.py). Showcase additional testing for determinism, fp8 sizes for MLA - Added samples to showcase chunked prefill. - **New API**: Added support for `moe_grouped_matmul`. See [cpp sample](samples/cpp/moe_grouped_matmul/moe_grouped_matmul.cpp) and documentation for API reference. - **More samples**: Open sourcing cudnn`s [fuzzy testing of matmuls](test/python/test_matmul_fuzzer.py) - **More samples**: Open sourcing cudnn`s [fuzzy testing of convolutions](test/python/test_conv_fuzzer.py) - Updated the benchmark results for the sdpa improvements added in cuDNN 9.18.1

Anerudhan self-assigned this Jan 27, 2026

Anerudhan merged commit b8c0656 into main Jan 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cudnn Frontend v1.18.0-release #201

cudnn Frontend v1.18.0-release #201

Uh oh!

Anerudhan commented Jan 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cudnn Frontend v1.18.0-release #201

cudnn Frontend v1.18.0-release #201

Uh oh!

Conversation

Anerudhan commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

cuDNN Frontend v1.18.0 Release Notes

General Improvements 🚀

Open-Source Kernels

Enhancements ✨

Scaled Dot-Product Attention (SDPA)

Mixture of Expers (MoE)

Matmul

Convolution

Additional Improvements

Benchmarking 📊

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Anerudhan commented Jan 27, 2026 •

edited

Loading