Skip to content

Two Bugs in transformer_trainer.py and early_stopping.py #72

@PSWON3045

Description

@PSWON3045

Hello, and thank you for maintaining the SPLADE repository.
While working with the codebase, I noticed two issues that appear to affect training behavior. I would like to report them here in case they require correction.


1. Typo in SiameseTransformerTrainer.forward() — augment_pairs vs. augment_pair

  • File: splade/tasks/transformer_trainer.py
  • Class: SiameseTransformerTrainer(TransformerTrainer)

Inside the forward function, the following conditional is used:

if "augment_pairs" in self.config:
            if self.config["augment_pairs"] == "in_batch_negatives":

However, the configuration files use the key augment_pair (without s).
This mismatch prevents the augmentation logic from being triggered as intended.
It seems the correct key should likely be:

if "augment_pair" in self.config:
            if self.config["augment_pair"] == "in_batch_negatives":


2. Incorrect assignment of self.fn in early_stopping.py due to missing parentheses

  • File: splade/tasks/base/early_stopping.py
  • Class: class EarlyStopping`

When the key is not "loss" (e.g., using "MRR@10"), the comparator always returns a function object (which is Truthy) instead of a boolean, causing the check to always pass.

The current implementation is:

self.fn = lambda x, y: x < y if mode == "loss" else lambda a, b: a > b

Due to operator precedence, when
mode != "loss", this expression returns theinner lambda function object itself(lambda a, b: a > b), rather than executing it.
Since function objects in Python are Truthy, the condition if self.fn(val_perf, self.best) :always evaluates to True.
Impact

  1. Early Stopping never triggers: The counter is reset to 0 at every validation step because the condition is always effectively True.
  2. "Best" Model Overwrite: Every checkpoint is saved as the "best" model, regardless of whether performance actually improved. The final saved model is simply the last checkpoint, not the best one.

example:

python


mode = "metric"
fn = lambda x, y: x < y if mode == "loss" else lambda a, b: a > b
result = fn(0.5, 0.4) 
print(f"Type: {type(result)}") 
# Output: <class 'function'> -> This effectively evaluates to True in if-statements

Parentheses should be used to enforce correct precedence:

self.fn = (lambda x, y: x < y) if mode == "loss" else (lambda a, b: a > b)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions