-
Notifications
You must be signed in to change notification settings - Fork 14
Cleanup v3 #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup v3 #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors optimizer state management by moving most of the functionality from the model-specific LLamaOptimizerStateManager into the base AdamWStateManager class. This cleanup effort introduces a new GenericTensorContainer class for generic tensor storage and consolidates common optimizer state operations.
Key changes:
- Introduced
GenericTensorContainerfor storing tensors in a vector-based container - Moved optimizer state allocation and management logic into the base
AdamWStateManagerclass - Simplified checkpoint save/load operations by delegating to the optimizer's methods
- Added virtual methods to
IModelfor creating block and non-block tensor containers
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utilities/tensor_container.h | Adds GenericTensorContainer class for storing tensors in a vector |
| src/training/model.h | Replaces optimizer accessor methods with single optimizer() method; adds virtual methods for tensor container creation |
| src/training/model.cpp | Implements helper methods for creating block and non-block containers |
| src/training/checkpoint.cpp | Delegates optimizer checkpoint operations to the optimizer's save/load methods |
| src/training/adamw_optimizer.h | Moves optimizer state storage and buffer management to base class; keeps checkpoint methods virtual |
| src/training/adamw_optimizer.cpp | Implements generic state allocation and buffer management in base class |
| src/models/llama_optimizer.h | Removes most implementation details, keeping only checkpoint-specific logic |
| src/models/llama_optimizer.cpp | Implements checkpoint save/load using OptStateWrapper for tensor name mapping |
| src/models/llama_model.h | Updates interface to use new optimizer accessor and adds container shape methods |
| src/models/llama_model.cpp | Implements tensor shape filling and updates optimizer initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Move almost all optimizer state functionality into the base class. Only the mapping from tensor indices to checkpoint names needs to remain in the "llama"-specific implementation.