Skip to content

new pipeline and benchmarks#52

Open
aitorDL wants to merge 1 commit intoexperts_newfrom
bench_pipeline
Open

new pipeline and benchmarks#52
aitorDL wants to merge 1 commit intoexperts_newfrom
bench_pipeline

Conversation

@aitorDL
Copy link

@aitorDL aitorDL commented Feb 6, 2026

addition of the new CLIP training pipeline and ultrasound benchmark

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Optuna-driven CLIP training pipeline and introduces a new ultrasound downstream benchmark (MLP-on-embeddings) to evaluate trained models.

Changes:

  • Added train_new_pipeline.py to train CLIP dual-encoder models and optimize hyperparameters with Optuna using downstream benchmark scores.
  • Added an ultrasound anatomical region benchmark pipeline (ultrasound_new_benchmark.py) built on cached CLIP embeddings and an MLP classifier (mlp_eval.py).
  • Introduced a shared Benchmark abstract base class to unify benchmark evaluation interfaces.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 26 comments.

File Description
src/multimeditron/experts/train_new_pipeline.py New training + Optuna objective pipeline intended to train CLIP and evaluate via benchmarks.
src/multimeditron/experts/evaluation_pipeline/ultrasound_new_benchmark.py New ultrasound benchmark that builds embeddings from JSONL datasets and evaluates with an MLP head.
src/multimeditron/experts/evaluation_pipeline/mlp_eval.py New MLP evaluation utility used by the ultrasound benchmark (k-fold hyperparam sweep + final test eval).
src/multimeditron/experts/evaluation_pipeline/Benchmark.py New abstract Benchmark base class for evaluation modules.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kwargs["dataset_tags"].append(dataset.dataset_name)

trainer.create_model_card(**kwargs)
#returns the training value
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except sets train_result = None on failure, but the function later unconditionally returns train_result.metrics["train_loss"], which will raise an AttributeError and mask the original training error. Either re-raise after logging or return a sentinel value when training fails.

Suggested change
#returns the training value
#returns the training value
if train_result is None or getattr(train_result, "metrics", None) is None:
raise RuntimeError(
"Training failed earlier; 'train_result' is not available. "
"Check the training logs for the original error."
)

Copilot uses AI. Check for mistakes.
Comment on lines +342 to +355
def training(model_args, data_args, training_args, dataset, n_freeze, last_checkpoint):
# 5. Load pretrained model, tokenizer, and image processor
if model_args.vision_model_name and model_args.text_model_name:
# Dual encoder path
logger.info(f"Loading dual encoder with vision model {model_args.vision_model_name} "
f"and text model {model_args.text_model_name}")

model = VisionTextDualEncoderModel.from_vision_text_pretrained(
model_args.vision_model_name,
model_args.text_model_name,
cache_dir=model_args.cache_dir,
token=model_args.token,
).to(dtype=torch.bfloat16)

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_freeze is passed into training(...) and optimized in Optuna (freezed_layers), but it is never used to actually freeze any model layers. This makes the Optuna search misleading; either implement freezing (vision + text encoder layers) or remove the parameter from the objective/search space.

Copilot uses AI. Check for mistakes.
dataset = dataset[list(dataset.keys())[0]]
dataset = DatasetConfig(**dataset)
if dataset.dataset_name is not None:
if not hasattr(kwargs, "dataset_tags"):
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwargs is a dict, so hasattr(kwargs, "dataset_tags") is always false and kwargs["dataset_tags"] gets reset on every iteration, leaving only the last dataset tag. Use a dict key check (e.g., "dataset_tags" not in kwargs) so tags accumulate correctly.

Suggested change
if not hasattr(kwargs, "dataset_tags"):
if "dataset_tags" not in kwargs:

Copilot uses AI. Check for mistakes.
torch.save(self.data, save_path + "/data_emb_test_"+ model_name + ".pt")
torch.save(self.labels, save_path + "/data_lab_test_" + model_name + ".pt")
else:
self.data = torch.load(save_path + "/data_embl_test_"+ model_name +".pt")
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cached test embeddings are saved as data_emb_test_... but loaded from data_embl_test_... (typo), so load=True will fail with FileNotFoundError. Make the save/load filenames consistent.

Suggested change
self.data = torch.load(save_path + "/data_embl_test_"+ model_name +".pt")
self.data = torch.load(save_path + "/data_emb_test_"+ model_name +".pt")

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +107
def evaluate_pipeline(model, model_name):
device = "cuda"
model = model.to(device)
print("beginnig of the evaluation")
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluate_pipeline forces device = "cuda" without checking availability; this will crash on CPU-only machines. Use torch.device("cuda" if torch.cuda.is_available() else "cpu") (and move tensors/model accordingly).

Copilot uses AI. Check for mistakes.
from sklearn.utils.class_weight import compute_class_weight
from Benchmark import Benchmark
import numpy as np
import os
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +16
from transformers import (
VisionTextDualEncoderModel,
VisionTextDualEncoderProcessor,
AutoImageProcessor,
AutoTokenizer,
VisionTextDualEncoderConfig
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'VisionTextDualEncoderConfig' is not used.
Import of 'AutoTokenizer' is not used.
Import of 'AutoImageProcessor' is not used.
Import of 'VisionTextDualEncoderProcessor' is not used.

Suggested change
from transformers import (
VisionTextDualEncoderModel,
VisionTextDualEncoderProcessor,
AutoImageProcessor,
AutoTokenizer,
VisionTextDualEncoderConfig
)
from transformers import VisionTextDualEncoderModel

Copilot uses AI. Check for mistakes.
AutoTokenizer,
VisionTextDualEncoderConfig
)
import optuna
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'optuna' is not used.

Suggested change
import optuna

Copilot uses AI. Check for mistakes.
VisionTextDualEncoderConfig
)
import optuna
from transformers import CLIPModel, CLIPProcessor
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'CLIPProcessor' is not used.
Import of 'CLIPModel' is not used.

Suggested change
from transformers import CLIPModel, CLIPProcessor

Copilot uses AI. Check for mistakes.
from multiprocessing import Pool
from optuna.storages import JournalStorage
from optuna.storages.journal import JournalFileBackend
import os
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module os is redundant, as it was previously imported on line 24.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant