-
Notifications
You must be signed in to change notification settings - Fork 64
Description
- Inspired by Adding ray as a distributor rom1504/img2dataset#272
- Depends on Expose img2dataset distributor #58
- Depends on Add AWS S3 dependencies to environment.yml #60
Usage
Cluster creation
ray up --yes cluster.ymlray dashboard cluster.ymlJob submission
git clone https://github.com/mlfoundations/datacompray job submit \
--address=http://localhost:8265 \
--working-dir=datacomp \
--runtime-env-json="$(
jq --null-input '
{
conda: "datacomp/environment.yml",
env_vars: {
AWS_ACCESS_KEY_ID: env.AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY: env.AWS_SECRET_ACCESS_KEY,
AWS_SESSION_TOKEN: env.AWS_SESSION_TOKEN
}
}
'
)" \
-- \
python download_upstream.py \
--subjob_size=11520 \
--thread_count=128 \
--processes_count=1 \
--distributor=ray \
--metadata_dir=/tmp/metadata \
--data_dir=s3://datacomp-small \
--scale=smallNote
Image shards would be saved to the datacomp-small AWS S3 bucket, specified with the --data_dir option.
Cluster deletion
$ ray down --yes cluster.ymlConfiguration
Sample cluster.yml
cluster_name: datacomp-downloader
min_workers: 0
max_workers: 10
upscaling_speed: 1.0
docker:
run_options: [--dns=127.0.0.1]
image: rayproject/ray:2.6.1-py310
container_name: ray
provider:
type: aws
region: us-east-1
cache_stopped_nodes: false
available_node_types:
ray.head.default:
resources: {}
node_config:
InstanceType: m5.12xlarge
ImageId: ami-068d304eca3399469
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 200
VolumeType: gp2
ray.worker.default:
resources: {}
node_config:
InstanceType: m5.12xlarge
ImageId: ami-068d304eca3399469
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
DeleteOnTermination: true
VolumeSize: 200
VolumeType: gp2
initialization_commands:
- wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
- sudo dpkg --install knot-resolver-release.deb
- sudo apt-get update
- sudo apt-get install --yes knot-resolver
- echo $(hostname --all-ip-addresses) $(hostname) | sudo tee --append /etc/hosts
- sudo systemctl start kresd@{1..48}.service
- echo nameserver 127.0.0.1 | sudo tee /etc/resolv.conf
- sudo systemctl stop systemd-resolved
setup_commands:
- sudo apt-get update
- sudo apt-get install --yes build-essential ffmpegObscure details
-
When
--data_dirpoints to a cloud storage like S3, we also have to specify a local--metadata_dirbecause the downloader script doesn't support saving metadata to cloud storage. -
The last
pip installon thesetup_commandssection is needed for compatibility with AWS S3, because the required libraries aren't included in thecondaenvironment file. -
There is no need to provide additional AWS credentials if the destination bucket is on the same account as the cluster, because it already has S3 full access through an instance profile.- While the cluster has a default instance profile that grants full S3 access, it doesn't seem to work as intended (probably due to rate limit of IMDS endpoint), and I ended up having to pass my local AWS credentials as environment variables.
-
The Python version in
environment.ymlmust match the Python version of the Ray cluster; make sure thatdocker.imageoncluster.yamlhas exactly the same version as theenvironment.ymlfrom this project.