This repository contains starter code for COMP597: Sustainability in Systems Design - Energy Efficiency analysis using CodeCarbon.
The starter code provides the basics to train a machine learning model using PyTorch. More precisely, the provided code is a command line tool that is designed to be easily extended with new features. It provides the basics to add models, add command line arguments for configuration purposes, add data collection methods, or modify the training loop.
The repository also provides you with the means to run the code both locally and on Slurm. The expectations are that the Slurm nodes managed by the schools will be used for the final experiments, but if you have a GPU with Cuda and wish to test your code locally, you will find everything you need to do so as well (assuming a Linux host).
As mentioned above, the provided code is a command line tool. The entry point is the launch.py file, located at the root of this repository. Running the python3 launch.py --help (locally) or srun.sh --help (Slurm) will print a basic help message listing all the possible flags that can be used to configure the execution of a training loop.
Before digging straight into the code, visit the documentation. It provides details about the provided code, the required Python environment, how to use Slurm in the context of this project, and how to extend the code provided.
| Milabench Benchmark Name | Milabench Source Code | Model Name | Type | Architecture | Size | Documentation | Dataset | Pretrained Weights | Notes |
|---|---|---|---|---|---|---|---|---|---|
| bert-tf32-fp16 | milabench/benchmarks/huggingface |
BERT | NLP | Transformer | 116M | HuggingFace Documentation | Synthetic Dataset from MilaBench | No pre-trained weights, Milabench uses the default HugginFace config. See how Milabench creates the model here | N/A |
| N/A | milabench/benchmarks/huggingface |
DistilBERT | NLP | Transformer | 67M | HuggingFace Documentation | Synthetic Dataset from MilaBench | HuggingFace Model Card, See how Milabench loads the model here | N/A |
| t5 | milabench/benchmarks/huggingface |
T5 | NLP | Transformer | 220M | HuggingFace Documentation | Synthetic Dataset from MilaBench | HuggingFace T5 Base Model Card, See how Milabench loads the model here | N/A |
| N/A | milabench/benchmarks/huggingface |
OPT | NLP | Transformer | 350M | HuggingFace Documentation | Synthetic Dataset from MilaBench | HuggingFace Opt-350M Model Card, See how Milabench loads the model here | N/A |
| N/A | milabench/benchmarks/torchvision |
Resnet152 | CV | CNN | 60M | Pytorch Model Documentation | FakeImageNet | TorchVison pretrained weights | N/A |
| convnext_large-tf32-fp16 | milabench/benchmarks/torchvision |
ConvNext Large | CV | CNN | 200M | Pytorch Model Documentation | FakeImageNet | TorchVision pretrained weights | N/A |
| regnet_y_128gf | milabench/benchmarks/torchvision |
RegNet Y 128GF | CV | CNN,RNN | 693M | PyTorch Model Documentation | FakeImageNet | TorchVision pretrained weight | N/A |
| vjepa-single | milabench/benchmarks/vjepa |
V-Jepa2 | CV | Transformer | 632M | Source library | MilaBench FakeVideo Dataset Generation | No pre-trained weights, see how they load the model here using the init_video_model. |
It would be best to create a gitsubmodule to import the library they use under src/models/vjepa2/, or similar. |
| pna | milabench/benchmarks/geo_gnn |
PNA | Graphs | GNN | 4M | Torch Geometric Documentation | Milabench uses a subset of PCQM4Mv2, which they obtain with this code | No pretrained weights, as Milabench trains a model from scratch using a subset of PCQM4Mv2. See the model configuration here | N/A |
| recursiongfn | milabench/benchmarks/recursiongfn |
GFlowNet | Graphs | GFlowNet, T. | 600M | Paper introducing model, library used by Milabench | No dataset as it is a generative task. | Unfortunately, there is no documentation, but the weights are from here | It would be best to create a gitsubmodule to import the library they use under src/models/gflownet/, or similar. |
| whisper | milabench/benchmarks/huggingface |
Whisper | ASR | Transformer | 37.8M | HuggingFace Documentation | Synthetic Dataset from MilaBench | HuggingFace Whisper Tiny Model Card, See how Milabench creates the model here | N/A |
Some of the datasets you will be using are larger the storage provided for this course. The reality is that you do not need a full dataset to do energy measurements. A subset allowing for a few hundred or thousand iterations is sufficient to get meaningful results. Remember, we are not measuring the performance of the models, so there is no need to train for a certain accuracy or other metric.
If you decide to store your dataset on the shared partition (see the provided Slurm documentation), please reduce the size of the dataset to around 5GB (at most 10GB). For example, the dataset used in the provided GPT2 example, only contains one file of the C4 dataset. There are various ways to achieve this, and it is dependent on your dataset. Explore the options available!