Skip to content

🥇 A project for analyzing Olympic medal data with R, combining TidyTuesday records and World Bank indicators to assess raw medal counts, efficiency metrics, and economic context. It generates diverse visualizations, performs regression and clustering, and reveals patterns in national Olympic performance.

License

Notifications You must be signed in to change notification settings

synth001/Olympic-Medal-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Olympic Medal Data Analysis 🥇

Welcome to the Olympic Medal Data Analysis project! This repository focuses on analyzing Olympic medal data using R. By combining TidyTuesday records with World Bank indicators, we assess raw medal counts, efficiency metrics, and the economic context surrounding Olympic performance.

Download Releases

Table of Contents

Introduction

The Olympic Games serve as a global stage for athletic excellence. This project aims to delve into the data behind these events, providing insights into how countries perform and what factors influence their success.

Project Overview

This project integrates data from TidyTuesday and World Bank indicators to explore several key areas:

  • Raw medal counts for different countries.
  • Efficiency metrics that evaluate how effectively countries convert resources into medals.
  • Economic context that provides background on the countries' performances.

By leveraging R's powerful data analysis capabilities, we create visualizations that reveal patterns and trends in Olympic performance over the years.

Data Sources

The primary data sources for this project include:

  • TidyTuesday: A weekly data project that provides datasets for analysis.
  • World Bank Indicators: Economic data that gives context to Olympic performance.

We clean and preprocess this data to ensure it is ready for analysis.

Features

This project includes several key features:

  • Data Cleaning: Scripts to clean and preprocess raw data.
  • Visualizations: A variety of charts and graphs to illustrate findings.
  • Statistical Analysis: Regression and clustering methods to identify patterns.
  • Economic Context: Analysis of how economic factors relate to Olympic success.

Getting Started

To get started with this project, you need to clone the repository and install the required packages.

Prerequisites

Make sure you have R and RStudio installed on your machine. You will also need the following R packages:

  • tidyverse
  • ggplot2
  • dplyr
  • readr
  • tidyr

Installation

Clone the repository:

git clone https://github.com/synth001/Olympic-Medal-Data-Analysis.git

Navigate to the project directory:

cd Olympic-Medal-Data-Analysis

Install the required packages:

install.packages(c("tidyverse", "ggplot2", "dplyr", "readr", "tidyr"))

Usage

Once you have installed the required packages, you can run the analysis scripts. The main script to start with is analysis.R. This script will load the data, perform the necessary analysis, and generate visualizations.

Run the script in RStudio:

source("analysis.R")

This will execute the analysis and produce the visualizations in your RStudio environment.

Visualizations

Visualizations are a critical part of this project. They help convey complex data in an understandable way. We create various types of charts, including:

  • Bar Charts: To display the total number of medals won by each country.
  • Line Graphs: To show trends in Olympic performance over time.
  • Heat Maps: To visualize the efficiency of countries in converting resources to medals.

Here’s an example of a bar chart that displays the total medal counts:

library(ggplot2)

ggplot(data, aes(x = country, y = total_medals)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Total Olympic Medals by Country", x = "Country", y = "Total Medals")

Regression and Clustering

We apply regression analysis to understand the relationship between economic indicators and medal counts. This helps identify which factors most significantly influence Olympic success.

Regression Analysis

We use linear regression to model the relationship between GDP and the number of medals won. The following code snippet illustrates this:

model <- lm(total_medals ~ gdp_per_capita, data = data)
summary(model)

Clustering Analysis

Clustering helps group countries based on similar performance metrics. We utilize k-means clustering for this purpose. Here’s how to implement it:

set.seed(123)
clusters <- kmeans(data[, c("total_medals", "gdp_per_capita")], centers = 3)
data$cluster <- clusters$cluster

Contributing

We welcome contributions to this project. If you have suggestions or improvements, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push your branch to your fork.
  5. Create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Acknowledgments

We thank the TidyTuesday community for providing valuable datasets. Their efforts help promote data analysis and visualization skills across various domains.

For further details and updates, visit our Releases section.

Download Releases

Happy analyzing!

About

🥇 A project for analyzing Olympic medal data with R, combining TidyTuesday records and World Bank indicators to assess raw medal counts, efficiency metrics, and economic context. It generates diverse visualizations, performs regression and clustering, and reveals patterns in national Olympic performance.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •