Skip to content

OMichaud0/COMP597-starter-code

Repository files navigation

COMP597 Starter Code

This repository contains starter code for COMP597: Sustainability in Systems Design - Energy Efficiency analysis using CodeCarbon.

The starter code provides the basics to train a machine learning model using PyTorch. More precisely, the provided code is a command line tool that is designed to be easily extended with new features. It provides the basics to add models, add command line arguments for configuration purposes, add data collection methods, or modify the training loop.

The repository also provides you with the means to run the code both locally and on Slurm. The expectations are that the Slurm nodes managed by the schools will be used for the final experiments, but if you have a GPU with Cuda and wish to test your code locally, you will find everything you need to do so as well (assuming a Linux host).

Getting Started

As mentioned above, the provided code is a command line tool. The entry point is the launch.py file, located at the root of this repository. Running the python3 launch.py --help (locally) or srun.sh --help (Slurm) will print a basic help message listing all the possible flags that can be used to configure the execution of a training loop.

Before digging straight into the code, visit the documentation. It provides details about the provided code, the required Python environment, how to use Slurm in the context of this project, and how to extend the code provided.

Models

Milabench Benchmark Name Milabench Source Code Model Name Type Architecture Size Documentation Dataset Pretrained Weights Notes
bert-tf32-fp16 milabench/benchmarks/huggingface BERT NLP Transformer 116M HuggingFace Documentation Synthetic Dataset from MilaBench No pre-trained weights, Milabench uses the default HugginFace config. See how Milabench creates the model here N/A
N/A milabench/benchmarks/huggingface DistilBERT NLP Transformer 67M HuggingFace Documentation Synthetic Dataset from MilaBench HuggingFace Model Card, See how Milabench loads the model here N/A
t5 milabench/benchmarks/huggingface T5 NLP Transformer 220M HuggingFace Documentation Synthetic Dataset from MilaBench HuggingFace T5 Base Model Card, See how Milabench loads the model here N/A
N/A milabench/benchmarks/huggingface OPT NLP Transformer 350M HuggingFace Documentation Synthetic Dataset from MilaBench HuggingFace Opt-350M Model Card, See how Milabench loads the model here N/A
N/A milabench/benchmarks/torchvision Resnet152 CV CNN 60M Pytorch Model Documentation FakeImageNet TorchVison pretrained weights N/A
convnext_large-tf32-fp16 milabench/benchmarks/torchvision ConvNext Large CV CNN 200M Pytorch Model Documentation FakeImageNet TorchVision pretrained weights N/A
regnet_y_128gf milabench/benchmarks/torchvision RegNet Y 128GF CV CNN,RNN 693M PyTorch Model Documentation FakeImageNet TorchVision pretrained weight N/A
vjepa-single milabench/benchmarks/vjepa V-Jepa2 CV Transformer 632M Source library MilaBench FakeVideo Dataset Generation No pre-trained weights, see how they load the model here using the init_video_model. It would be best to create a gitsubmodule to import the library they use under src/models/vjepa2/, or similar.
pna milabench/benchmarks/geo_gnn PNA Graphs GNN 4M Torch Geometric Documentation Milabench uses a subset of PCQM4Mv2, which they obtain with this code No pretrained weights, as Milabench trains a model from scratch using a subset of PCQM4Mv2. See the model configuration here N/A
recursiongfn milabench/benchmarks/recursiongfn GFlowNet Graphs GFlowNet, T. 600M Paper introducing model, library used by Milabench No dataset as it is a generative task. Unfortunately, there is no documentation, but the weights are from here It would be best to create a gitsubmodule to import the library they use under src/models/gflownet/, or similar.
whisper milabench/benchmarks/huggingface Whisper ASR Transformer 37.8M HuggingFace Documentation Synthetic Dataset from MilaBench HuggingFace Whisper Tiny Model Card, See how Milabench creates the model here N/A

Datasets

Some of the datasets you will be using are larger the storage provided for this course. The reality is that you do not need a full dataset to do energy measurements. A subset allowing for a few hundred or thousand iterations is sufficient to get meaningful results. Remember, we are not measuring the performance of the models, so there is no need to train for a certain accuracy or other metric.

If you decide to store your dataset on the shared partition (see the provided Slurm documentation), please reduce the size of the dataset to around 5GB (at most 10GB). For example, the dataset used in the provided GPT2 example, only contains one file of the C4 dataset. There are various ways to achieve this, and it is dependent on your dataset. Explore the options available!

CodeCarbon Resources


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •