This Repo contains clones from MiniGPT-4 and MiniGPT-Med (Can be seen in respective folders).
Some of the training files might be outdated as they were uploaded to a cluster and might've been modified.
I would suggest looking at updated multimodal frameworks and LLMs or creating your own Vision Transformer -> LLM. If you do use these cloned repos, make sure to read their READMEs.
The dataset used is VinDr-CXR, I can't redistribute this dataset as it's a restricted-access resource. If you have access, feel free to contact me for the post-processed dataset + annotations and results. However you should be able to create the post-processed dataset given these files.
Not explicitly stated in the MiniGPT-4 and MiniGPT-Med repos, to train you must be on a Linux environment and have at least 16GB VRAM (12GB is not enough).
Llama 2 needs to be downloaded somewhere on your computer.
Values that need to be changed in files (i.e. changing paths) are designated by putting "CHANGE ME" where you need to change the value.
Where the postprocessed train and test annotations lie.
Where the postprocessed train and test imgs lie.
Where the unprocessed vindr-dataset lies.
These Python files process the dataset into the correct size, annotations and viewable pngs/jpegs. The python files listed here are ordered alphabetically
Given a test dicom id, takes the postprocessed img and draws rectangles on it given either model output or labels.
Crops original dicom images given data_type (train or test files) to 448x448 and saves to data directory. Also saves scaling factor used to crop image for use in another python file.
Generates labels given the scaling annotations from scale_annotations.py and the annotations from the vindr dataset. Saves to annotations.
Translates the labels generated in label_gen.py to a prompt to feed LLM. Prompts are saved to annotations.
Changes the prompt output (list of dictionaries) to just a single dictionary and saves to file in annotations.
Evaluates the model in terms of IoU (bounding boxes), Rouge and BLEU (text eval). Needs parsed results from parse_vindr_results.py.
Evaluates the model's ability to predict localized diseases using various metrics. Needs parsed results from parse_vindr_results.py.
Evaluates the model's ability to predict global diseases using various metrics. Needs parsed results from parse_vindr_results.py.
Parses the results file after running the eval script in the respective repos. To be used for metric evaluation.
Adjusts the bounding boxes in the vindr-cxr dataset to be scaled correctly after the postprocessing of the image. To be used in label_gen.py