-
Notifications
You must be signed in to change notification settings - Fork 642
Blockwise scaling linear quantization recipe #1559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
timmoon10
merged 68 commits into
NVIDIA:main
from
kwyss-nvidia:kwyss/subchannel_recipe_linear
Apr 10, 2025
Merged
Changes from all commits
Commits
Show all changes
68 commits
Select commit
Hold shift + click to select a range
fbcbcb0
Add GEMM logic for blockwise quantized tensors.
kwyss-nvidia 522ffbe
Update NVTE_BLOCK_SCALING for GEMM.
kwyss-nvidia d7e1fce
Gate feature on CUDA 12.9
kwyss-nvidia f212c81
Gemm typo.
kwyss-nvidia 48b2d57
Remove unecessary type converter change.
kwyss-nvidia 5761589
Reflect epilogue availability and test supported epilogues.
kwyss-nvidia 07b19b7
GEMM simplifications from recipe branch.
kwyss-nvidia c4a41b8
Format py code.
kwyss-nvidia 51ed2fb
Update GEMM DGelu tests to match support depending on output dtype.
kwyss-nvidia e7af140
Force pow2Scales in GEMM
kwyss-nvidia 596a009
Add GEMM test to pytorch test suite.
kwyss-nvidia 4aa6067
Add copyright to GEMM test.
kwyss-nvidia 758dc4a
Update import for GEMM test.
kwyss-nvidia 7d5b5d9
Add license.
kwyss-nvidia efdf8e0
Update test gemm supported predicate.
kwyss-nvidia a9f209a
Use sgemm like interfaces and naming.
kwyss-nvidia 861c870
Rewrite GEMM comment.
kwyss-nvidia ada6438
MR Feedback.
kwyss-nvidia d69585a
Recipe setup for Linear modules.
kwyss-nvidia 754e0bd
Use 12.9 feature test.
kwyss-nvidia 1483996
Run against tensor dumps from internal library.
kwyss-nvidia f0dadc5
Update FIXME to TODO with linked issue.
kwyss-nvidia e4f2c28
Update full recompute feature to save recipe.
kwyss-nvidia ea8f53e
MR Feedback. Avoid reusing quantizer objects.
kwyss-nvidia dfcb3df
Update logic in module.
kwyss-nvidia b938c3e
Format py.
kwyss-nvidia c8f6322
Update for PP bug.
kwyss-nvidia 4c5f51f
Update test numerics.
kwyss-nvidia cad09a9
Update force_power_of_2 scales in the recipe.
kwyss-nvidia a9e3178
Update usage method to satisfy upstream changes.
kwyss-nvidia ac65cee
fix subchannel recipe in distributed test with bf16 gather
zhongbozhu c64d0e7
Edit and cleanup BF16 gather code.
kwyss-nvidia 99b5908
Update test import.
kwyss-nvidia 6daa8df
support columnwise only mode to 1D quantize kernel
zhongbozhu fb66148
Format and move enum
kwyss-nvidia 70c5034
Skip alloc.
kwyss-nvidia d81946c
try async bf16 gather
zhongbozhu a577801
Format python code.
kwyss-nvidia a6e9d28
Document and type code.
kwyss-nvidia 52c18a1
Update pytorch lint errors.
kwyss-nvidia 80057a6
Dont set high precision dtype.
kwyss-nvidia 77cfef4
Add test for sanity and CG; fix CG for sequential?
ksivaman dbcff16
Keep make_quantizers API stable
kwyss-nvidia 9e50b6d
Fix import name.
kwyss-nvidia 0e23591
Rename recipe method.
kwyss-nvidia 45519f1
Skip grouped linear sanity test.
kwyss-nvidia a21e65b
Set usage before BF16 gather.
kwyss-nvidia e6ad90e
Merge remote-tracking branch 'origin/main' into HEAD
kwyss-nvidia 0e8d324
refactor for nvte_quantize_v2
zhongbozhu e077601
Format code.
kwyss-nvidia 07a70b8
Cleanup nvte_quantize_v2
kwyss-nvidia 64f2601
Test fp32 scales.
kwyss-nvidia 3cb712c
Disable CUDA graph.
kwyss-nvidia 6f84d2c
Merge remote-tracking branch 'origin/main' into HEAD
kwyss-nvidia 07a563b
Simplify layernorm linear
kwyss-nvidia 9a3abe2
Cleanup layernorm linear.
kwyss-nvidia 27d9922
LayerNorm linear bwd gather logic.
kwyss-nvidia b62d555
Communication updates.
kwyss-nvidia 196cd6d
Update transformer_engine/pytorch/ops/op.py
kwyss-nvidia 67e790b
Lint fix.
pre-commit-ci[bot] ea9e46b
MR feedback.
kwyss-nvidia 324792b
Enable cuda graph tests.
kwyss-nvidia 54e7279
Reduce chance of spurious failure and reword.
kwyss-nvidia 0bf7844
Review suggestions from @timmoon10
timmoon10 62662ae
Merge branch 'main' into kwyss/subchannel_recipe_linear
timmoon10 7efac72
Update CPP tests.
kwyss-nvidia c3ee3d8
Update common.h
yaox12 59cb49c
Update test_float8blockwisetensor.py
yaox12 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.