Repository contains practice Exploratory Data Analysis (EDA) and Machine Learning Analysis for Kaggle Data Sets
Periodically updated with new practice files
Currently Contains:
- EDA - VideoGame Dataset --> https://www.kaggle.com/datasets/asaniczka/video-game-sales-2024/data
- EDA/ML classification - Kinematics Dataset --> https://www.kaggle.com/datasets/yasserh/kinematics-motion-data/data
All work is performed in Apache Spark with datasets > 10K entries. Utilizes Python spark libraries and spark SQL to perform data analysis
Each project contains documentation to outline thought process