Skip to content

refactor generator code #92

@carlushuang

Description

@carlushuang

need to generalize code generation logic for different direction, precision, arch

  • global load/store:

    • support different precision, fp32/fp16(short)/ubyte
    • support 2d/3d load, and have exec mask from different dimension
    • support global_load/buffer_load and accumulate through sgpr/vgpr
  • share memory load/store:

    • support 1d/2d load/store from different precision
    • support k pack
  • coalescing store:

    • support multiple groups to do coalescing store
    • support fp16/int8 final store out pack operation
    • support some case not need LDS shuffle
    • vector write out support
  • mfma main loop:

    • different repeat/step
    • support need inst-schedule or no need inst-schedule
    • support k pack suitable from instruction requirement and precision
    • support share load multiple k_pack at once, then do mfma multiple times
    • pass through LDS
  • fma main loop

  • thread mapping

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions