-
-
Notifications
You must be signed in to change notification settings - Fork 8
feat: pipeline parallelism optimizations - load balancing, 1F1B scheduling, activation checkpointing #845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ooples
wants to merge
13
commits into
master
Choose a base branch
from
feat/pipeline-parallelism-optimizations-463
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,007
−160
Open
feat: pipeline parallelism optimizations - load balancing, 1F1B scheduling, activation checkpointing #845
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
39a1cc8
feat: add pipeline parallelism optimizations - load balancing, 1f1b s…
ooples 1a1dd10
fix: integrate pipeline parallelism options into aimodelbuilder facade
ooples 452a45d
feat: add zero bubble and interleaved pipeline schedules with backwar…
ooples 44b33ad
fix: implement production-ready backward decomposition, virtual stage…
ooples fc999fa
fix: add source generator exclusions, validation, and tag safety for …
ooples 3e33018
fix: clean up all schedule implementations and pipeline model code qu…
ooples 14b10fc
fix: split configure methods, fix virtual-stage routing, and fail-fas…
ooples 53c1ccc
fix: add schedule bounds validation, checkpoint guards, and cost doc …
ooples 3d957c5
fix: address remaining pr review comments for pipeline parallelism
ooples a8fb44a
Merge remote-tracking branch 'origin/master' into feat/pipeline-paral…
ooples c2da96e
Merge branch 'master' into feat/pipeline-parallelism-optimizations-463
ooples 14572e0
fix: resolve ci build failure from duplicate generator references and…
ooples a354873
Merge branch 'feat/pipeline-parallelism-optimizations-463' of https:/…
ooples File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| namespace AiDotNet.DistributedTraining; | ||
|
|
||
| /// <summary> | ||
| /// Configuration for activation checkpointing in pipeline parallel training. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// <para> | ||
| /// Activation checkpointing (also called gradient checkpointing) trades compute for memory | ||
| /// by only storing activations at checkpoint layers during the forward pass. Intermediate | ||
| /// activations are recomputed from the nearest checkpoint during the backward pass. | ||
| /// </para> | ||
| /// <para><b>For Beginners:</b> During training, the forward pass must save intermediate results | ||
| /// (activations) so the backward pass can compute gradients. For very deep models, storing all | ||
| /// these activations uses enormous amounts of memory. | ||
| /// | ||
| /// Activation checkpointing is like taking notes at chapter boundaries instead of every page: | ||
| /// - Without checkpointing: Save every activation (lots of memory, no recomputation) | ||
| /// - With checkpointing: Save every Nth activation, recompute the rest (less memory, more compute) | ||
| /// | ||
| /// Memory savings: O(L) → O(sqrt(L)) where L = number of layers. | ||
| /// For 100 layers, this reduces memory from 100 activations to ~10 activations. | ||
| /// | ||
| /// The trade-off is ~33% more compute time, but this enables training models that otherwise | ||
| /// wouldn't fit in memory. | ||
| /// </para> | ||
| /// <para><b>Reference:</b> Chen et al., "Training Deep Nets with Sublinear Memory Cost", 2016. | ||
| /// https://arxiv.org/abs/1604.06174</para> | ||
| /// </remarks> | ||
| public class ActivationCheckpointConfig | ||
| { | ||
| private int _checkpointEveryNLayers = 10; | ||
| private int _maxActivationsInMemory; | ||
|
|
||
| /// <summary> | ||
| /// Gets or sets whether activation checkpointing is enabled. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// <para><b>For Beginners:</b> Set this to true to enable memory savings. Default is false | ||
| /// (no checkpointing, standard behavior).</para> | ||
| /// </remarks> | ||
| public bool Enabled { get; set; } | ||
|
|
||
| /// <summary> | ||
| /// Gets or sets how often to save a checkpoint (every N layers). | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// <para><b>For Beginners:</b> Lower values save more activations (more memory, less recomputation). | ||
| /// Higher values save fewer (less memory, more recomputation). | ||
| /// | ||
| /// Optimal value is approximately sqrt(total_layers) for minimum total cost. | ||
| /// For a 100-layer model, checkpointing every 10 layers is a good default. | ||
| /// | ||
| /// Default: 10 layers between checkpoints.</para> | ||
| /// </remarks> | ||
| /// <exception cref="ArgumentOutOfRangeException">Thrown when value is less than 1.</exception> | ||
| public int CheckpointEveryNLayers | ||
| { | ||
| get => _checkpointEveryNLayers; | ||
| set | ||
| { | ||
| if (value < 1) | ||
| { | ||
| throw new ArgumentOutOfRangeException(nameof(CheckpointEveryNLayers), | ||
| $"CheckpointEveryNLayers must be at least 1, but was {value}. " + | ||
| "A value of 0 would cause division-by-zero in interval-based checkpointing."); | ||
| } | ||
| _checkpointEveryNLayers = value; | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Gets or sets the recomputation strategy to use during the backward pass. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// <para><b>For Beginners:</b> | ||
| /// - Selective: Only recompute activations that are needed and not checkpointed (recommended) | ||
| /// - Full: Recompute all non-checkpointed activations from the previous checkpoint | ||
| /// - None: Don't recompute, equivalent to no checkpointing (for testing/debugging) | ||
| /// </para> | ||
| /// </remarks> | ||
| public RecomputeStrategy RecomputeStrategy { get; set; } = RecomputeStrategy.Selective; | ||
|
|
||
| /// <summary> | ||
| /// Gets or sets the maximum number of activations to keep in memory simultaneously. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// <para><b>For Beginners:</b> This caps how many activations are stored at once. | ||
| /// Set to 0 for no limit (uses CheckpointEveryNLayers to determine storage). | ||
| /// A non-zero value overrides CheckpointEveryNLayers by dynamically adjusting | ||
| /// the checkpoint frequency to stay within the memory budget.</para> | ||
ooples marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| /// </remarks> | ||
| /// <exception cref="ArgumentOutOfRangeException">Thrown when value is negative.</exception> | ||
| public int MaxActivationsInMemory | ||
| { | ||
| get => _maxActivationsInMemory; | ||
| set | ||
| { | ||
| if (value < 0) | ||
| { | ||
| throw new ArgumentOutOfRangeException(nameof(MaxActivationsInMemory), | ||
| $"MaxActivationsInMemory must be non-negative, but was {value}. " + | ||
| "Use 0 for no limit."); | ||
| } | ||
| _maxActivationsInMemory = value; | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Gets or sets whether to checkpoint the very first layer's input. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// <para><b>For Beginners:</b> The first layer's input is always needed for the backward pass. | ||
| /// If true, it's saved as a checkpoint. If false, the caller must ensure the input is | ||
| /// available during the backward pass (which is usually the case).</para> | ||
| /// </remarks> | ||
| public bool CheckpointFirstLayer { get; set; } = true; | ||
ooples marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| /// <summary> | ||
| /// Strategy for recomputing activations during the backward pass. | ||
| /// </summary> | ||
| public enum RecomputeStrategy | ||
| { | ||
| /// <summary> | ||
| /// Only recompute activations that are needed for the current backward step. | ||
| /// This is the most memory-efficient but requires careful bookkeeping. | ||
| /// </summary> | ||
| Selective, | ||
|
|
||
| /// <summary> | ||
| /// Recompute all activations between the two nearest checkpoints during backward. | ||
| /// Simpler implementation but may do slightly more work than necessary. | ||
| /// </summary> | ||
| Full, | ||
|
|
||
| /// <summary> | ||
| /// No recomputation. Equivalent to disabled checkpointing. Useful for debugging. | ||
| /// </summary> | ||
| None | ||
| } | ||
ooples marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.