Skip to content

Upgrade Transformers to v4.57.x#829

Draft
calpt wants to merge 10 commits intoadapter-hub:mainfrom
calpt:sync/v4.57.x
Draft

Upgrade Transformers to v4.57.x#829
calpt wants to merge 10 commits intoadapter-hub:mainfrom
calpt:sync/v4.57.x

Conversation

@calpt
Copy link
Member

@calpt calpt commented Jan 16, 2026

No description provided.

calpt added 10 commits January 16, 2026 21:08
- Updated T5AttentionWithAdapters to use new DynamicCache API
  - Changed key_cache/value_cache access to layers[idx].keys/values
  - Added EncoderDecoderCache instance check before accessing cache properties

- Fixed class conversion state dict loading for T5/MT5AdapterModel
  - Added custom load_state_dict() to handle key remapping between static
    models (T5ForConditionalGeneration) and flex models (T5AdapterModel)
  - Static models use encoder.*/decoder.* keys, flex models expect
    transformer.encoder.*/transformer.decoder.* keys

- Fixed cls_representation extraction logic
  - Only extract cls_representation for classification heads
  - Prevents IndexError for seq2seq_lm and question_answering heads

All T5 test_methods tests (230/230) and MT5 tests (227/227) now pass.
- Moved load_state_dict() override to ModelWithFlexibleHeadsAdaptersMixin
  - Automatically detects wrapper attribute (model/transformer)
  - Handles key remapping for all encoder-decoder models universally
  - Eliminates code duplication across 6 model implementations

- Fixed class conversion for BART-family models
  - BART, mBART, PLBART, Whisper now properly load from static models
  - Automatic remapping of encoder.*/decoder.*/shared.* keys

- Fixed mBART cls_representation extraction
  - Only extract cls_representation for classification heads
  - Prevents errors for seq2seq_lm and question_answering heads

- Removed duplicate load_state_dict() from individual models:
  - T5AdapterModel, MT5AdapterModel
  - BartAdapterModel, MBartAdapterModel, PLBartAdapterModel
  - WhisperAdapterModel

Test results:
- T5: 4/4 ClassConversion and Bottleneck tests passing
- MT5: 3/3 tests passing
- mBART: 4/4 tests passing
- Whisper: 1/1 ClassConversion test passing
- BART: 2/3 tests passing (ClassConversion working)
- PLBART: 0/1 (pre-existing issue with past_key_values)

Total: 14/16 tests fixed (87.5% success rate)
- Remove send_example_telemetry import from all example scripts
- Remove telemetry function calls and associated comments
- Telemetry was not functional as the function doesn't exist in transformers.utils
- Tests pass successfully after removal
@calpt calpt marked this pull request as draft January 19, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant