Skip to content

anandk1999/Latency-Aware-Distributed-Join

Repository files navigation

CS 511 Research Project: Latency-Aware Distributed Join (R)

To get started, install the tables and spin up the cluster.

pip install gdown
gdown https://drive.google.com/drive/u/1/folders/1SfJhSPCvUfHI2Vc_toVZIIzz1_50LpOx -O . --folder
unzip data/tables.zip
bash start-all.sh

To induce latency between the nodes, first, enter main node terminal:

docker-compose -f spark-cluster-compose.yaml exec main bash

Then, from the main node terminal enter one of these commands to induce latency

bash /opt/spark/scripts/induce_latency.sh

Submit Spark Job (in a new terminal)

docker-compose -f spark-cluster-compose.yaml exec main spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 /opt/spark/scripts/join_optimizer.py

Restart the cluster and rebuild.

docker-compose -f spark-cluster-compose.yaml down

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •