Skip to content

Conversation

@stefankoncarevic
Copy link
Contributor

@stefankoncarevic stefankoncarevic commented Jan 20, 2026

⚠️ Do not merge until #2184 is merged - this PR depends on LDS transpose load attention support

Implement ds_read_tr8_b64 offset formulas for FP8/BF8 MFMA (16x32, 32x16). Enable mixed fp8/bf8 type combinations for GEMM operations on gfx950.

Motivation

Add FP8 and BF8 data type support for LDS transpose load optimization on gfx950.
This enables efficient matrix loads using ds_read_tr8_b64 hardware instruction

Technical Details

  • LdsTransposeLoad.cpp: Implemented FP8/BF8 offset formulas in getBasePanelOffsets()
  • LdsTransposeLoad.cpp: Updated type compatibility check in makeDecision()
  • Added areBothFp8Types() check to allow mixed fp8/bf8 combinations

Test Plan

  • Add e2e tests for FP8/BF8 GEMM
  • Add e2e tests for mixed fp8/bf8 combinations

Test Result

Implement ds_read_tr8_b64 offset formulas for FP8/BF8 MFMA (16x32, 32x16).
Enable mixed fp8/bf8 type combinations for GEMM operations on gfx950.
@stefankoncarevic stefankoncarevic marked this pull request as draft January 20, 2026 12:05
@stefankoncarevic stefankoncarevic marked this pull request as ready for review January 20, 2026 14:35
@stefankoncarevic stefankoncarevic changed the title [WIP] Add FP8/BF8 support for LDS transpose load Add FP8/BF8 support for LDS transpose load Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants