Support gradient accumulation #40

bbkx226 · 2025-12-14T07:28:50Z

Resolves #9

This pull request adds support for gradient accumulation to enable training with larger effective batch sizes, especially useful when GPU memory is limited. The implementation introduces a new configuration option, updates training logic to account for accumulation, and provides tests and documentation to verify and demonstrate the feature.

The most important changes are:

Gradient Accumulation Feature:

Added a new argument gradient_accumulation_steps (default: 1) to the Args class in arguments.py and to config files, allowing users to specify accumulation steps for training. [1] [2]
Updated training step calculation in run.py so that train_steps now represents the number of optimizer steps (after accumulation), not the number of micro-batches, ensuring correct training duration with accumulation. [1] [2]

Training Loop Adjustments:

Modified the main training loop in utils.py to scale loss by 1/gradient_accumulation_steps, accumulate gradients, and only step the optimizer after the specified number of micro-batches. Logging, validation, and checkpointing are now triggered only on optimizer steps. [1] [2] [3]

Testing and Validation:

Added test_gradient_accumulation.py to verify that the number of optimizer steps matches expectations for a given number of accumulation steps and micro-batches.
Added smoke_test_accumulation.py, a synthetic end-to-end test that runs a minimal pipeline with accumulation logic to ensure correct integration.

Documentation:

Updated README.md with a new section explaining gradient accumulation, configuration usage, and instructions for running the new tests.

…ctions

bbkx226 · 2025-12-14T07:28:59Z

#9

bbkx226 added 2 commits December 14, 2025 15:26

Add gradient accumulation support and update README with usage instru…

18ac21d

…ctions

Add demo configuration for gradient accumulation

8e6b078

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support gradient accumulation #40

Support gradient accumulation #40

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support gradient accumulation #40

Are you sure you want to change the base?

Support gradient accumulation #40

Uh oh!

Conversation

bbkx226 commented Dec 14, 2025

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant