Skip to content

0.48.0: Intel GPU & Gaudi support, CUDA 13, performance improvements, and more!

Choose a tag to compare

@matthewdouglas matthewdouglas released this 30 Sep 21:48
· 48 commits to main since this release

Highlights

🎉 Intel GPU Support

We now officially support Intel GPUs on Linux and Windows! Support is included for all major features (LLM.int8(), QLoRA, 8bit optimizers) with the exception of the paged optimizer feature.

This support includes the following hardware:

  • Intel® Arc™ B-Series Graphics
  • Intel® Arc™ A-Series Graphics
  • Intel® Data Center GPU Max Series

A compatible PyTorch version with Intel XPU support is required. The current minimum is PyTorch 2.6.0. It is recommended to use the latest stable release. See Getting Started on Intel GPU for guidance.

🎉 Intel Gaudi Support

We now officially support Intel Gaudi2 and Gaudi3 accelerators. This support includes LLM.int8() and QLoRA with the NF4 data type. At this time optimizers are not implemented.

A compatible PyTorch version with Intel Gaudi support is required. The current minimum is Gaudi v1.21 with PyTorch 2.6.0. It is recommended to use the latest stable release. See the Gaudi software installation guide for guidance.

NVIDIA CUDA

  • The 4bit dequantization kernel was improved by @Mhmd-Hisham in #1746. This change brings noticeable speed improvements for prefill, batch token generation, and training. The improvement is particularly prominent on A100, H100, and B200.
  • We've added CUDA 13.0 compatibility across Linux x86-64, Linux aarch64, and Windows x86-64 platforms.
    • Hardware support for CUDA 13.0 is limited to Turing generation and newer.
    • Support for Thor (SM110) is available in the Linux aarch64 build.

🚨 Breaking Changes

  • Dropped support for PyTorch 2.2. The new minimum requirement is 2.3.0.
  • Removed Maxwell GPU support for all CUDA builds.

What's Changed

New Contributors

Full Changelog: 0.47.0...0.48.0