Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new Optuna-driven CLIP training pipeline and introduces a new ultrasound downstream benchmark (MLP-on-embeddings) to evaluate trained models.
Changes:
- Added
train_new_pipeline.pyto train CLIP dual-encoder models and optimize hyperparameters with Optuna using downstream benchmark scores. - Added an ultrasound anatomical region benchmark pipeline (
ultrasound_new_benchmark.py) built on cached CLIP embeddings and an MLP classifier (mlp_eval.py). - Introduced a shared
Benchmarkabstract base class to unify benchmark evaluation interfaces.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 26 comments.
| File | Description |
|---|---|
src/multimeditron/experts/train_new_pipeline.py |
New training + Optuna objective pipeline intended to train CLIP and evaluate via benchmarks. |
src/multimeditron/experts/evaluation_pipeline/ultrasound_new_benchmark.py |
New ultrasound benchmark that builds embeddings from JSONL datasets and evaluates with an MLP head. |
src/multimeditron/experts/evaluation_pipeline/mlp_eval.py |
New MLP evaluation utility used by the ultrasound benchmark (k-fold hyperparam sweep + final test eval). |
src/multimeditron/experts/evaluation_pipeline/Benchmark.py |
New abstract Benchmark base class for evaluation modules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| kwargs["dataset_tags"].append(dataset.dataset_name) | ||
|
|
||
| trainer.create_model_card(**kwargs) | ||
| #returns the training value |
There was a problem hiding this comment.
The try/except sets train_result = None on failure, but the function later unconditionally returns train_result.metrics["train_loss"], which will raise an AttributeError and mask the original training error. Either re-raise after logging or return a sentinel value when training fails.
| #returns the training value | |
| #returns the training value | |
| if train_result is None or getattr(train_result, "metrics", None) is None: | |
| raise RuntimeError( | |
| "Training failed earlier; 'train_result' is not available. " | |
| "Check the training logs for the original error." | |
| ) |
| def training(model_args, data_args, training_args, dataset, n_freeze, last_checkpoint): | ||
| # 5. Load pretrained model, tokenizer, and image processor | ||
| if model_args.vision_model_name and model_args.text_model_name: | ||
| # Dual encoder path | ||
| logger.info(f"Loading dual encoder with vision model {model_args.vision_model_name} " | ||
| f"and text model {model_args.text_model_name}") | ||
|
|
||
| model = VisionTextDualEncoderModel.from_vision_text_pretrained( | ||
| model_args.vision_model_name, | ||
| model_args.text_model_name, | ||
| cache_dir=model_args.cache_dir, | ||
| token=model_args.token, | ||
| ).to(dtype=torch.bfloat16) | ||
|
|
There was a problem hiding this comment.
n_freeze is passed into training(...) and optimized in Optuna (freezed_layers), but it is never used to actually freeze any model layers. This makes the Optuna search misleading; either implement freezing (vision + text encoder layers) or remove the parameter from the objective/search space.
| dataset = dataset[list(dataset.keys())[0]] | ||
| dataset = DatasetConfig(**dataset) | ||
| if dataset.dataset_name is not None: | ||
| if not hasattr(kwargs, "dataset_tags"): |
There was a problem hiding this comment.
kwargs is a dict, so hasattr(kwargs, "dataset_tags") is always false and kwargs["dataset_tags"] gets reset on every iteration, leaving only the last dataset tag. Use a dict key check (e.g., "dataset_tags" not in kwargs) so tags accumulate correctly.
| if not hasattr(kwargs, "dataset_tags"): | |
| if "dataset_tags" not in kwargs: |
| torch.save(self.data, save_path + "/data_emb_test_"+ model_name + ".pt") | ||
| torch.save(self.labels, save_path + "/data_lab_test_" + model_name + ".pt") | ||
| else: | ||
| self.data = torch.load(save_path + "/data_embl_test_"+ model_name +".pt") |
There was a problem hiding this comment.
Cached test embeddings are saved as data_emb_test_... but loaded from data_embl_test_... (typo), so load=True will fail with FileNotFoundError. Make the save/load filenames consistent.
| self.data = torch.load(save_path + "/data_embl_test_"+ model_name +".pt") | |
| self.data = torch.load(save_path + "/data_emb_test_"+ model_name +".pt") |
| def evaluate_pipeline(model, model_name): | ||
| device = "cuda" | ||
| model = model.to(device) | ||
| print("beginnig of the evaluation") |
There was a problem hiding this comment.
evaluate_pipeline forces device = "cuda" without checking availability; this will crash on CPU-only machines. Use torch.device("cuda" if torch.cuda.is_available() else "cpu") (and move tensors/model accordingly).
| from sklearn.utils.class_weight import compute_class_weight | ||
| from Benchmark import Benchmark | ||
| import numpy as np | ||
| import os |
There was a problem hiding this comment.
Import of 'os' is not used.
| import os |
| from transformers import ( | ||
| VisionTextDualEncoderModel, | ||
| VisionTextDualEncoderProcessor, | ||
| AutoImageProcessor, | ||
| AutoTokenizer, | ||
| VisionTextDualEncoderConfig | ||
| ) |
There was a problem hiding this comment.
Import of 'VisionTextDualEncoderConfig' is not used.
Import of 'AutoTokenizer' is not used.
Import of 'AutoImageProcessor' is not used.
Import of 'VisionTextDualEncoderProcessor' is not used.
| from transformers import ( | |
| VisionTextDualEncoderModel, | |
| VisionTextDualEncoderProcessor, | |
| AutoImageProcessor, | |
| AutoTokenizer, | |
| VisionTextDualEncoderConfig | |
| ) | |
| from transformers import VisionTextDualEncoderModel |
| AutoTokenizer, | ||
| VisionTextDualEncoderConfig | ||
| ) | ||
| import optuna |
There was a problem hiding this comment.
Import of 'optuna' is not used.
| import optuna |
| VisionTextDualEncoderConfig | ||
| ) | ||
| import optuna | ||
| from transformers import CLIPModel, CLIPProcessor |
There was a problem hiding this comment.
Import of 'CLIPProcessor' is not used.
Import of 'CLIPModel' is not used.
| from transformers import CLIPModel, CLIPProcessor |
| from multiprocessing import Pool | ||
| from optuna.storages import JournalStorage | ||
| from optuna.storages.journal import JournalFileBackend | ||
| import os |
There was a problem hiding this comment.
This import of module os is redundant, as it was previously imported on line 24.
| import os |
addition of the new CLIP training pipeline and ultrasound benchmark