-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
need to generalize code generation logic for different direction, precision, arch
-
global load/store:
- support different precision, fp32/fp16(short)/ubyte
- support 2d/3d load, and have exec mask from different dimension
- support
global_load/buffer_loadand accumulate through sgpr/vgpr
-
share memory load/store:
- support 1d/2d load/store from different precision
- support k pack
-
coalescing store:
- support multiple groups to do coalescing store
- support fp16/int8 final store out pack operation
- support some case not need LDS shuffle
- vector write out support
-
mfma main loop:
- different repeat/step
- support need inst-schedule or no need inst-schedule
- support k pack suitable from instruction requirement and precision
- support share load multiple k_pack at once, then do mfma multiple times
- pass through LDS
-
fma main loop
-
thread mapping
Metadata
Metadata
Assignees
Labels
No labels