Add lit tests; shows initial `bfloat16` support for `tt.dot` by abrown · Pull Request #253 · kernelize-ai/triton-cpu

abrown · 2026-01-26T23:08:28Z

The "expand to target type" logic added in #231 works for bfloat16 types, not just float16. This change adds lit tests to show the general form of the lowering our convert-triton-cpu-to-llvm does when run.

The "expand to target type" logic added in kernelize-ai#231 works for `bfloat16` types, not just `float16`. This change adds lit tests to show the general form of the lowering our `convert-triton-cpu-to-llvm` does when run.

abrown · 2026-01-26T23:14:01Z

I will note that performance for bfloat16 is quite bad compared to float16 (slower by an order of magnitude): e.g., when bfloat16 support is patched in to the 03-matrix-multiplication.py tutorial, I observe that each bfloat16 value is individually converted using __truncsfbf2 which delegates to some expensive logic (float2bfloat).

alexbaden

These look fine, though once #254 lands I imagine you would have to update the fmul and fadd bits. Might be easier to just check for promotion and assume the promoted value is used (best yet canonicalize in the RUN command so an unused fp extension gets dropped) .

abrown · 2026-01-28T19:04:19Z

I tried to additionally add:

    // COM: And at least 8 multiplications:
    // CHECK-COUNT-8: llvm.fmul

But I think they have to be sequential? I guess I can figure that out in a follow-on PR.

alexbaden

I'm fine with this for now with the caveats that long term (1) test_core.py::test_dot is preferred for correctness and (2) we should strive to make lit tests as focused as possible (e.g. only test promotion to fp32 during the llvm lowering).

Add lit tests; shows initial bfloat16 support

80000fe

The "expand to target type" logic added in kernelize-ai#231 works for `bfloat16` types, not just `float16`. This change adds lit tests to show the general form of the lowering our `convert-triton-cpu-to-llvm` does when run.

abrown changed the title ~~Add lit tests; shows initial bfloat16 support~~ Add lit tests; shows initial bfloat16 support for tt.dot Jan 26, 2026

abrown requested a review from alexbaden January 27, 2026 18:45

alexbaden reviewed Jan 28, 2026

View reviewed changes

review: add --canonicalize pass

d250c84

alexbaden approved these changes Jan 29, 2026

View reviewed changes

abrown requested a review from alexbaden January 29, 2026 17:13

abrown merged commit d63fd59 into kernelize-ai:main Jan 29, 2026
4 checks passed

abrown deleted the bf16 branch January 29, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lit tests; shows initial `bfloat16` support for `tt.dot`#253

Add lit tests; shows initial `bfloat16` support for `tt.dot`#253
abrown merged 2 commits intokernelize-ai:mainfrom
abrown:bf16

abrown commented Jan 26, 2026

Uh oh!

abrown commented Jan 26, 2026

Uh oh!

alexbaden left a comment

Uh oh!

abrown commented Jan 28, 2026

Uh oh!

alexbaden left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abrown commented Jan 26, 2026

Uh oh!

abrown commented Jan 26, 2026

Uh oh!

alexbaden left a comment

Choose a reason for hiding this comment

Uh oh!

abrown commented Jan 28, 2026

Uh oh!

alexbaden left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants