Open
Conversation
Contributor
Author
|
This is a draft for now until we can discuss what to do about the fastmath flags. |
alexbaden
reviewed
Jan 28, 2026
| // Multiply and accumulate. | ||
| auto mul = LLVM::FMulOp::create(builder, loc, tgtTy, aElem, bElem); | ||
| accum = LLVM::FAddOp::create(builder, loc, tgtTy, accum, mul); | ||
| auto flags = LLVM::FastmathFlagsAttr::get(builder.getContext(), |
Contributor
There was a problem hiding this comment.
tl.dot_scaled has a fast math flag, but triton typically prefers fast math to be off
This change replaces the `llvm.fmul` and `llvm.fadd` instructions with the fused `llvm.fma` operation. This should have no downstream impact on the emitted machine code which, due to auto-vectorization and other LLVM magic, already ends up using `VFMADD213PS`. What _is_ unclear about this change is that we materialize some fastmath flags from thin air: it seems like we should be able to configure this somewhere at the user level (TODO).
Contributor
Author
|
This has no effect on performance. I still see |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change replaces the
llvm.fmulandllvm.faddinstructions with the fusedllvm.fmaoperation. This should have no downstream impact on the emitted machine code which, due to auto-vectorization and other LLVM magic, already ends up usingVFMADD213PS.What is unclear about this change is that we materialize some fastmath flags from thin air: it seems like we should be able to configure this somewhere at the user level (TODO).