To get started, install the tables and spin up the cluster.
pip install gdown
gdown https://drive.google.com/drive/u/1/folders/1SfJhSPCvUfHI2Vc_toVZIIzz1_50LpOx -O . --folder
unzip data/tables.zip
bash start-all.shTo induce latency between the nodes, first, enter main node terminal:
docker-compose -f spark-cluster-compose.yaml exec main bashThen, from the main node terminal enter one of these commands to induce latency
bash /opt/spark/scripts/induce_latency.shSubmit Spark Job (in a new terminal)
docker-compose -f spark-cluster-compose.yaml exec main spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 /opt/spark/scripts/join_optimizer.pyRestart the cluster and rebuild.
docker-compose -f spark-cluster-compose.yaml down