Skip to content

Conversation

@Anerudhan
Copy link
Collaborator

@Anerudhan Anerudhan commented Jan 27, 2026

cuDNN Frontend v1.18.0 Release Notes

cuDNN Frontend v1.18.0 is the recommended version for cuDNN 9.18.1 and later releases.

General Improvements 🚀

  • Move away from internally using the v0.x API. Rather, now the cudnn backend API is directly called.
  • Improve the execution overhead by caching repeated graph query.

Open-Source Kernels

New open source kernel for Grouped Gemm and Swiglu fussion

Enhancements ✨

Scaled Dot-Product Attention (SDPA)

  • New Features: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths.

  • Support Surface:

    • Now allows deterministic bprop for SDPA
    • Added support for bprop for ragged tensors in A100
  • More samples:

    • Open sourcing our sdpa test harness. Showcase additional testing for determinism, fp8 sizes for MLA
    • Added samples to showcase chunked prefill.

Mixture of Expers (MoE)

  • New API: Added support for moe_grouped_matmul. See cpp sample and documentation for API reference.

Matmul

Convolution

Additional Improvements

Benchmarking 📊

  • Updated the benchmark results for the sdpa improvements added in cuDNN 9.18.1

…ttps://docs.nvidia.com/deeplearning/cudnn/backend/latest/release-notes.html#cudnn-9-18-1) and later releases.

- Move away from internally using the v0.x API. Rather, now the cudnn backend API is directly called.
- Improve the execution overhead by caching repeated graph query.

New open source kernel for Grouped Gemm and Swiglu fussion
- [Grouped GEMM + SwiGLU](gemm_fusions/grouped_gemm_swiglu.md)

- **New Features**: Allows support for dynamic shapes for fprop. This will help reduce the graph building across different batch and sequence lengths.

- **Support Surface**:
    - Now allows deterministic bprop for SDPA
    - Added support for bprop for ragged tensors in A100

- **More samples**:
    - Open sourcing our sdpa [test harness](test/python/test_mhas_v2.py). Showcase additional testing for determinism, fp8 sizes for MLA
    - Added samples to showcase chunked prefill.

- **New API**: Added support for `moe_grouped_matmul`. See [cpp sample](samples/cpp/moe_grouped_matmul/moe_grouped_matmul.cpp) and documentation for API reference.

- **More samples**: Open sourcing cudnn`s [fuzzy testing of matmuls](test/python/test_matmul_fuzzer.py)

- **More samples**: Open sourcing cudnn`s [fuzzy testing of convolutions](test/python/test_conv_fuzzer.py)

- Updated the benchmark results for the sdpa improvements added in cuDNN 9.18.1
@Anerudhan Anerudhan self-assigned this Jan 27, 2026
@Anerudhan Anerudhan merged commit b8c0656 into main Jan 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant