This repo presents a carbon- and cost-aware scheduling framework.
├── LICENSE
├── README.md
├── config-GAIA-spot.yaml # Sample Configuration for spot
├── config-GAIA.yaml #Sample Configuration for cluster
├── jobs
│ ├── nbody # Sample MPI Job
│ └── profiles # Sample Job Profile
├── notebooks
│ ├── evaluation_plot.ipynb
├── requirements.txt
└── src
├── carbon.py
├── cluster
├── cluster_traces
├── figure10.sh
├── figure6-7.sh
├── figure8.sh
├── figure9.sh
├── run.py
├── scheduling
├── task.py
└── traces
The code do not have any hardware requirements. The AWS tests were executed on c7gn.medium machines.
The simulation only requires pandas and numpy, while plotting requires seabon and matplotlib. To install requirements
pip3 install -r requirements.txtAWS ParallelCluster recommends using a virtual environment.
To start a cluster for the first time follow the instructions here and use the following command.
pcluster configure --config config-file.yaml
Creating Our Custom Cluster A sample of the full configuration used is in
config-GAIA.yamlandconfig-GAIA-spot.yaml.
An Executeble MPI Job To create real executable MPI jobs, we built an N-body simulation MPI jobs. Setup and Execution details are available here.
Installing PySlurm (needed by GAIA) to communicate with AWS ParallelCluster Scheduler
export SLURM_INCLUDE_DIR=/opt/slurm/include
export SLURM_LIB_DIR=/opt/slurm/lib
git clone https://github.com/PySlurm/pyslurm.git && cd pyslurm
pip install .For more details check the official website.
The program takes enlisted configurations and simulates the execution of the job.
python3 src/run.py -h
usage: run.py [-h] [-c CARBON_TRACE] [--cluster-type {simulation,slurm}] [-t TASK_TRACE] [-r RESERVED_INSTANCES]
[-w WAITING_TIMES_STR] [--scheduling-policy {carbon,carbon-spot,carbon-cost,carbon-cost-spot,cost,suspend-resume}]
[-i START_INDEX] [--carbon-policy {waiting,lowest,oracle,cst_oracle,cst_average}] [-p CLUSTER_PARTITION]
GAIA: Carbon Aware Scheduling Policies
optional arguments:
-h, --help show this help message and exit
-c CARBON_TRACE, --carbon-trace CARBON_TRACE
Carbon Trace
--cluster-type {simulation,slurm}
Cluster Type Interface
-t TASK_TRACE, --task-trace TASK_TRACE
Task Trace
-r RESERVED_INSTANCES, --reserved-instances RESERVED_INSTANCES
Reserved Instances
-w WAITING_TIMES_STR, --waiting-times WAITING_TIMES_STR
Waiting times per queue `x` separated
--scheduling-policy {carbon,carbon-spot,carbon-cost,carbon-cost-spot,cost,suspend-resume}
-i START_INDEX, --start-index START_INDEX
carbon start index
--carbon-policy {waiting,lowest,oracle,cst_oracle,cst_average}
-p CLUSTER_PARTITION, --cluster-partition CLUSTER_PARTITIONTo reproduce Figures 8-12, we provide 4 bash scripts that customizes runs the experiments with the needed configuration.
We provided a jupyter notebook to plot figures in notebooks/evaluation_plot.ipynb.
Figure 8: Normalized carbon emissions and waiting times across policies. Figure 9: CDF of the normalized total carbon reductions.
./src/figure8-9.shFigure 10: Normalized Carbon, Cost, and Waiting Time across policies when using reserved instances.
./src/figure10.shFigure 11: Effect of reserved instances on the carbon savings and cost using a work conserving and carbon-aware scheduling policy.
./src/figure11.shFigure 12: Effect of both spot and reserved instances on the carbon savings and cost using multiple policies and configurations.
./src/figure12.shFollow same scripts but add --cluster-type slurm flag to each command and execute GAIA inside the cluster master.
The following tables provides a mapping between policies names and acronyms used in the paper and instructions to run them i.e., the --scheduling-policy and --carbon-policy flags.
| Policy | Scheduling Policy | Carbon Policy |
|---|---|---|
| No Jobs Wait (NJW) | carbon | oracle (-w 0) |
| All Jobs Wait Threshold (AJW-T) | cost | oracle |
| Lowest Carbon Slot (Lowest-Slot) | carbon | lowest |
| Lowest Carbon Widow (Lowest Window) | carbon | waiting |
| Carbon Savings per Waiting Time (Carbon-Time) | carbon | cst_average |
| Ecovisor | suspend-resume-threshold | oracle |
| Wait Awhile | suspend-resume | oracle |
| Reserved First + Carbon-Time (Res-First-Carbon-Time) | carbon-cost | cst_average |
| Spot First + Carbon-Time (Spot-First-Carbon-Time) | carbon-spot | cst_average |
| Spot First + Ecovisor (Spot-First-Ecovisor) | suspend-resume-spot-threshold | oracle |
| Spot and Reserved Aware + Carbon Time (SPOT-RES-Carbon-Time) | carbon-cost-spot | cst_average |