-
Notifications
You must be signed in to change notification settings - Fork 66
using sum/world_size to replace avg #2691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the compile-time conditional AVG reduction operation with a runtime emulation approach. Previously, the code used #if !defined(XCCL_HAS_AVG) preprocessor directives to conditionally emulate AVG by performing SUM followed by division. The new implementation unconditionally emulates AVG for all oneCCL operations since oneCCL doesn't support AVG on all scheduler paths.
Key changes:
- Added helper functions
shouldEmulateAvg()andtoXcclReduceOp()to centralize AVG emulation logic - Removed all
#if !defined(XCCL_HAS_AVG)preprocessor conditionals - Applied consistent AVG emulation across all reduction operations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return reduceOp == ReduceOp::AVG; | ||
| } | ||
|
|
||
| ReduceOp toXcclReduceOp(const ReduceOp& reduceOp, const at::Tensor& input) { |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name 'toXcclReduceOp' is misleading since it doesn't actually convert to an XCCL-specific type but rather conditionally transforms AVG to SUM. Consider renaming to 'getEmulatedReduceOp' or 'transformReduceOpForXccl' to better reflect its purpose.
| ReduceOp toXcclReduceOp(const ReduceOp& reduceOp, const at::Tensor& input) { | |
| ReduceOp getEmulatedReduceOp(const ReduceOp& reduceOp, const at::Tensor& input) { |
| if (reduceOp == ReduceOp::AVG) { | ||
| TORCH_CHECK( | ||
| input.scalar_type() != at::kBool, | ||
| "Cannot use ReduceOp.AVG with boolean inputs"); | ||
| return ReduceOp::SUM; | ||
| } |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boolean input validation is performed in 'toXcclReduceOp()' but this check occurs at each reduction call site. Consider moving this validation earlier (e.g., in the parent functions that call the reduction operations) to avoid redundant checks in hot paths.
|
oneCCL supports AVG directly over the XeLink path. Simulating everything with SUM followed by division would incur a performance penalty. Do we really have a strong justification to mandate that all AVG operations be emulated in this way? |
| #if !defined(XCCL_HAS_AVG) | ||
| if (opts.reduceOp == ReduceOp::AVG) { | ||
| input, output, comm, reduceOp, xcclStream, stream); | ||
| if (shouldEmulateAvg(opts.reduceOp)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you evaluated the perf impact? If you find some cases unsupported for avg, you should ask oneCCL for support instead of replacing avg by sum.
fix https://jira.devtools.intel.com/browse/PYTORCHDGQ-6106.
oneccl does not allow avg and throw error while checking, so replace it using sum/world_size. nothing else is changed