COMP597 Starter Code

This repository contains starter code for COMP597: Sustainability in Systems Design - Energy Efficiency analysis using CodeCarbon.

The starter code provides the basics to train a machine learning model using PyTorch. More precisely, the provided code is a command line tool that is designed to be easily extended with new features. It provides the basics to add models, add command line arguments for configuration purposes, add data collection methods, or modify the training loop.

The repository also provides you with the means to run the code both locally and on Slurm. The expectations are that the Slurm nodes managed by the schools will be used for the final experiments, but if you have a GPU with Cuda and wish to test your code locally, you will find everything you need to do so as well (assuming a Linux host).

Getting Started

As mentioned above, the provided code is a command line tool. The entry point is the launch.py file, located at the root of this repository. Running the python3 launch.py --help (locally) or srun.sh --help (Slurm) will print a basic help message listing all the possible flags that can be used to configure the execution of a training loop.

Before digging straight into the code, visit the documentation. It provides details about the provided code, the required Python environment, how to use Slurm in the context of this project, and how to extend the code provided.

Models

Milabench Benchmark Name	Milabench Source Code	Model Name	Type	Architecture	Size	Documentation	Dataset	Pretrained Weights	Notes
bert-tf32-fp16	`milabench/benchmarks/huggingface`	BERT	NLP	Transformer	116M	HuggingFace Documentation	Synthetic Dataset from MilaBench	No pre-trained weights, Milabench uses the default HugginFace config. See how Milabench creates the model here	N/A
N/A	`milabench/benchmarks/huggingface`	DistilBERT	NLP	Transformer	67M	HuggingFace Documentation	Synthetic Dataset from MilaBench	HuggingFace Model Card, See how Milabench loads the model here	N/A
t5	`milabench/benchmarks/huggingface`	T5	NLP	Transformer	220M	HuggingFace Documentation	Synthetic Dataset from MilaBench	HuggingFace T5 Base Model Card, See how Milabench loads the model here	N/A
N/A	`milabench/benchmarks/huggingface`	OPT	NLP	Transformer	350M	HuggingFace Documentation	Synthetic Dataset from MilaBench	HuggingFace Opt-350M Model Card, See how Milabench loads the model here	N/A
N/A	`milabench/benchmarks/torchvision`	Resnet152	CV	CNN	60M	Pytorch Model Documentation	FakeImageNet	TorchVison pretrained weights	N/A
convnext_large-tf32-fp16	`milabench/benchmarks/torchvision`	ConvNext Large	CV	CNN	200M	Pytorch Model Documentation	FakeImageNet	TorchVision pretrained weights	N/A
regnet_y_128gf	`milabench/benchmarks/torchvision`	RegNet Y 128GF	CV	CNN,RNN	693M	PyTorch Model Documentation	FakeImageNet	TorchVision pretrained weight	N/A
vjepa-single	`milabench/benchmarks/vjepa`	V-Jepa2	CV	Transformer	632M	Source library	MilaBench FakeVideo Dataset Generation	No pre-trained weights, see how they load the model here using the `init_video_model`.	It would be best to create a gitsubmodule to import the library they use under `src/models/vjepa2/`, or similar.
pna	`milabench/benchmarks/geo_gnn`	PNA	Graphs	GNN	4M	Torch Geometric Documentation	Milabench uses a subset of PCQM4Mv2, which they obtain with this code	No pretrained weights, as Milabench trains a model from scratch using a subset of PCQM4Mv2. See the model configuration here	N/A
recursiongfn	`milabench/benchmarks/recursiongfn`	GFlowNet	Graphs	GFlowNet, T.	600M	Paper introducing model, library used by Milabench	No dataset as it is a generative task.	Unfortunately, there is no documentation, but the weights are from here	It would be best to create a gitsubmodule to import the library they use under `src/models/gflownet/`, or similar.
whisper	`milabench/benchmarks/huggingface`	Whisper	ASR	Transformer	37.8M	HuggingFace Documentation	Synthetic Dataset from MilaBench	HuggingFace Whisper Tiny Model Card, See how Milabench creates the model here	N/A

Datasets

Some of the datasets you will be using are larger the storage provided for this course. The reality is that you do not need a full dataset to do energy measurements. A subset allowing for a few hundred or thousand iterations is sufficient to get meaningful results. Remember, we are not measuring the performance of the models, so there is no need to train for a certain accuracy or other metric.

If you decide to store your dataset on the shared partition (see the provided Slurm documentation), please reduce the size of the dataset to around 5GB (at most 10GB). For example, the dataset used in the provided GPT2 example, only contains one file of the C4 dataset. There are various ways to achieve this, and it is dependent on your dataset. Explore the options available!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
admin		admin
config		config
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
base-requirements-gflownet.txt		base-requirements-gflownet.txt
base-requirements-pna.txt		base-requirements-pna.txt
base-requirements.txt		base-requirements.txt
launch.py		launch.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMP597 Starter Code

Getting Started

Models

Datasets

CodeCarbon Resources

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

OMichaud0/COMP597-starter-code

Folders and files

Latest commit

History

Repository files navigation

COMP597 Starter Code

Getting Started

Models

Datasets

CodeCarbon Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages