Add Matryoshka Representation Learning (MRL) support and tests #39

bbkx226 · 2025-12-14T07:18:08Z

Resolves #8

This pull request introduces Matryoshka Representation Learning (MRL) to the codebase, enabling models to produce multi-granular embeddings that can be flexibly truncated to various dimensions for downstream tasks. It adds configuration options, utility functions, and a comprehensive test suite for MRL, and integrates the MRL loss into both training and validation workflows.

Matryoshka Representation Learning (MRL) Integration:

Added support for MRL in the training pipeline, allowing a single model to serve multiple embedding dimensions by adding an auxiliary MRL loss over truncated embeddings. This is configurable via use_mrl, mrl_dimensions, and mrl_temperature parameters in the config and arguments. [1] [2] [3]
Implemented the matryoshka_loss function in utils.py to compute the MRL loss, which encourages high-quality embeddings at multiple dimensions.

Configuration and Argument Enhancements:

Added MRL-specific parameters (use_mrl, mrl_dimensions, mrl_temperature) to the Args class and provided an example configuration file config_mrl.json. [1] [2]

Training and Validation Pipeline Updates:

Integrated MRL loss into the inbatch_loss and hard_loss functions, and updated the training (accelerate_train) and validation (validate) routines to use the new MRL parameters when enabled. [1] [2] [3] [4] [5] [6]

Testing and Documentation:

Added a comprehensive test suite in test_mrl.py to validate MRL loss computation, integration with existing losses, and embedding truncation behavior.
Updated the README.md with an explanation of MRL, configuration steps, and quick validation instructions.

bbkx226 · 2025-12-14T07:18:22Z

#8

Add Matryoshka Representation Learning (MRL) support and tests

ab3103d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Matryoshka Representation Learning (MRL) support and tests #39

Add Matryoshka Representation Learning (MRL) support and tests #39

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Matryoshka Representation Learning (MRL) support and tests #39

Are you sure you want to change the base?

Add Matryoshka Representation Learning (MRL) support and tests #39

Uh oh!

Conversation

bbkx226 commented Dec 14, 2025

Uh oh!

bbkx226 commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant