new pipeline and benchmarks by aitorDL · Pull Request #52 · EPFLiGHT/MultiMeditron

aitorDL · 2026-02-06T23:36:31Z

addition of the new CLIP training pipeline and ultrasound benchmark

Copilot

Pull request overview

This PR adds a new Optuna-driven CLIP training pipeline and introduces a new ultrasound downstream benchmark (MLP-on-embeddings) to evaluate trained models.

Changes:

Added train_new_pipeline.py to train CLIP dual-encoder models and optimize hyperparameters with Optuna using downstream benchmark scores.
Added an ultrasound anatomical region benchmark pipeline (ultrasound_new_benchmark.py) built on cached CLIP embeddings and an MLP classifier (mlp_eval.py).
Introduced a shared Benchmark abstract base class to unify benchmark evaluation interfaces.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 26 comments.

File	Description
`src/multimeditron/experts/train_new_pipeline.py`	New training + Optuna objective pipeline intended to train CLIP and evaluate via benchmarks.
`src/multimeditron/experts/evaluation_pipeline/ultrasound_new_benchmark.py`	New ultrasound benchmark that builds embeddings from JSONL datasets and evaluates with an MLP head.
`src/multimeditron/experts/evaluation_pipeline/mlp_eval.py`	New MLP evaluation utility used by the ultrasound benchmark (k-fold hyperparam sweep + final test eval).
`src/multimeditron/experts/evaluation_pipeline/Benchmark.py`	New abstract `Benchmark` base class for evaluation modules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-07T17:30:00Z

src/multimeditron/experts/train_new_pipeline.py

+            kwargs["dataset_tags"].append(dataset.dataset_name)
+
+    trainer.create_model_card(**kwargs)
+    #returns the training value


The try/except sets train_result = None on failure, but the function later unconditionally returns train_result.metrics["train_loss"], which will raise an AttributeError and mask the original training error. Either re-raise after logging or return a sentinel value when training fails.

Suggested change

#returns the training value

#returns the training value

if train_result is None or getattr(train_result, "metrics", None) is None:

raise RuntimeError(

"Training failed earlier; 'train_result' is not available. "

"Check the training logs for the original error."

)

Copilot · 2026-02-07T17:30:01Z

src/multimeditron/experts/train_new_pipeline.py

+def training(model_args, data_args, training_args, dataset, n_freeze, last_checkpoint):
+    # 5. Load pretrained model, tokenizer, and image processor
+    if model_args.vision_model_name and model_args.text_model_name:
+        # Dual encoder path
+        logger.info(f"Loading dual encoder with vision model {model_args.vision_model_name} "
+                f"and text model {model_args.text_model_name}")
+
+        model = VisionTextDualEncoderModel.from_vision_text_pretrained(
+            model_args.vision_model_name,
+            model_args.text_model_name,
+            cache_dir=model_args.cache_dir,
+            token=model_args.token,
+        ).to(dtype=torch.bfloat16)
+


n_freeze is passed into training(...) and optimized in Optuna (freezed_layers), but it is never used to actually freeze any model layers. This makes the Optuna search misleading; either implement freezing (vision + text encoder layers) or remove the parameter from the objective/search space.

Copilot · 2026-02-07T17:30:01Z

src/multimeditron/experts/train_new_pipeline.py

+            dataset = dataset[list(dataset.keys())[0]]
+        dataset = DatasetConfig(**dataset)
+        if dataset.dataset_name is not None:
+            if not hasattr(kwargs, "dataset_tags"):


kwargs is a dict, so hasattr(kwargs, "dataset_tags") is always false and kwargs["dataset_tags"] gets reset on every iteration, leaving only the last dataset tag. Use a dict key check (e.g., "dataset_tags" not in kwargs) so tags accumulate correctly.

Suggested change

if not hasattr(kwargs, "dataset_tags"):

if "dataset_tags" not in kwargs:

Copilot · 2026-02-07T17:30:01Z