Skip to content

Conversation

@ngc92
Copy link
Collaborator

@ngc92 ngc92 commented Jan 9, 2026

Move almost all optimizer state functionality into the base class. Only the mapping from tensor indices to checkpoint names needs to remain in the "llama"-specific implementation.

@ngc92 ngc92 marked this pull request as ready for review January 9, 2026 22:01
Copilot AI review requested due to automatic review settings January 9, 2026 22:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors optimizer state management by moving most of the functionality from the model-specific LLamaOptimizerStateManager into the base AdamWStateManager class. This cleanup effort introduces a new GenericTensorContainer class for generic tensor storage and consolidates common optimizer state operations.

Key changes:

  • Introduced GenericTensorContainer for storing tensors in a vector-based container
  • Moved optimizer state allocation and management logic into the base AdamWStateManager class
  • Simplified checkpoint save/load operations by delegating to the optimizer's methods
  • Added virtual methods to IModel for creating block and non-block tensor containers

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/utilities/tensor_container.h Adds GenericTensorContainer class for storing tensors in a vector
src/training/model.h Replaces optimizer accessor methods with single optimizer() method; adds virtual methods for tensor container creation
src/training/model.cpp Implements helper methods for creating block and non-block containers
src/training/checkpoint.cpp Delegates optimizer checkpoint operations to the optimizer's save/load methods
src/training/adamw_optimizer.h Moves optimizer state storage and buffer management to base class; keeps checkpoint methods virtual
src/training/adamw_optimizer.cpp Implements generic state allocation and buffer management in base class
src/models/llama_optimizer.h Removes most implementation details, keeping only checkpoint-specific logic
src/models/llama_optimizer.cpp Implements checkpoint save/load using OptStateWrapper for tensor name mapping
src/models/llama_model.h Updates interface to use new optimizer accessor and adds container shape methods
src/models/llama_model.cpp Implements tensor shape filling and updates optimizer initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ngc92 ngc92 merged commit 258b9fd into dev Jan 12, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants