Spring 2019 Cloud Computing Class Project
Workflow Management System: Pegasus
Job Scheduler (Cloud): HTCondor Annex
We integrated the automated workflow management system Pegasus with high-throughput distributed computing job scheduling system HTCondor to create a AI-optimized Platform-as-a-Service for users to run their workflows reducing the need to manually compare and select the allocation of cloud and/or local computing resources.
Assume your home directory is /home/USERNAME on your local machine.
Clone this repo to your home directory
Change the USERNAME and REPONAMEto yours in the following files: rc.txt and sites.xml.
In our project, we used a simple pipeline of greyscaling an image code.
Or, supply your own daxgen.py to generate the dax file of your own worfklow (how to write dax file?)
We feed the AI optimizer the general resource limitations in the platforms available to the users (AWS EC2, local), and it outputs the optimized platform(s), type and number of instances. We used these results to dynamically allocate resources.
Platforms: AWS EC2 (aws), Local Condor Pool (local)
Run run.sh:
$ run.sh PLATFORMS [INSTANCES]
, where PLATFORMS can be: aws, local, or aws|local, INSTANCES is a comman-delimited string: instance-type2:count,instance-type2:count2.
For example: $ run.sh aws t2.micro:2,t2.nano:3
Above example shows that the workflow will only be executed remotely on AWS EC2, on 2 instances of t2.micro and 3 instances of t2.nano.
Check output for greyscalled image.
Run ./metrics/metrics.sh code
Here is a comprehensive report about this project.