Skip to content

Conversation

@stefankoncarevic
Copy link
Contributor

⚠️ Do not merge until #2210 is merged - this PR depends on LDS transpose load fp8 support

Motivation

Extends LDS transpose load optimization to support INT8 data types for GEMM and Attention kernels on gfx950. This enables hardware-accelerated transposed loads (ds_read_tr8_b64) for all INT8 MFMAs (16x16x32, 16x16x64, 32x32x16, 32x32x32), improving performance for INT8 quantized inference.

Technical Details

  • LdsTransposeLoad.cpp: Added INT8 type support, offset formulas for (16,64) and (32,32) geometries, and double-rate K-coverage logic
  • AccelEmitter.cpp: Added K-dimension transformation for INT8 MFMAs with kBase=16 when kpack=1
  • RockDialect.cpp/RockOps.td: Updated validation and type support for INT8 LDS transpose

Test Plan

Added MLIR unit tests
Added E2E tests
All tests verified on gfx950 hardware with numerical correctness validation

Test Result

Submission Checklist

- Add INT8 (i8) support in LdsTransposeLoad.cpp for ds_read_tr8_b64
- Support mfma_i32_16x16x32_i8, mfma_i32_16x16x64_i8, mfma_i32_32x32x16_i8, mfma_i32_32x32x32_i8
- Add INT8 16x64 and 32x32 MFMA geometries with double-rate K coverage
- Handle kpack=1 case for INT8 MFMAs with kBase=16 in AccelEmitter.cpp
- Add validation for INT8 MFMA geometries in RockDialect.cpp
- Add e2e tests for INT8 LDS transpose in GEMM and Attention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants