Skip to content

dmorton714/pokemon_analysis_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Data Analysis Pipeline: Pokémon Edition

This project was made to demo to new coders how a notebooks would be split up to be used in various python files. Showing a more production ready code base rather than a presentation based set of code.

This project demonstrates a complete, modular data analysis pipeline built in Python. It fetches data from the public PokéAPI, cleans it, engineers new features, generates visualizations, and provides tools for analysis. The entire process is structured to mimic a real-world production environment, emphasizing code separation, maintainability, and logging.

Project Structure

The project is organized into a clean and logical file tree to separate concerns.

pokemon_analysis_project/
├── production-notebook.ipynb    # A notebook the breaks everything down
├── pokemon_analyzer/
│   ├── __init__.py              # Makes the directory a Python package
│   ├── get_data.py              # Fetches data from the API
│   ├── clean_data.py              # Cleans and preprocesses raw data
│   ├── feature_engineering.py     # Creates new data features
│   ├── create_plots.py            # Generates visualizations
│   └── main.py                    # Main pipeline orchestrator
│
├── data/                          # Output directory for CSV files (auto-generated)
├── logs/                          # Output directory for log files (auto-generated)
├── plots/                         # Output directory for HTML plots (auto-generated)
│
└── requirements.txt             # Project dependencies

Features

  • Modular Pipeline: Each step of the data process (fetch, clean, feature engineer, plot) is in its own script.

  • API Data Fetching: Connects to the PokéAPI to get data for the first 151 Pokémon.

  • Robust Logging: Creates a data_fetch.log file to record the outcome of every API call, perfect for debugging.

  • User-Friendly CLI: A tqdm progress bar shows the status of the data fetching process in the terminal.

  • Data Cleaning: Handles missing values and converts data into standard units (e.g., kg, m).

  • Feature Engineering: Creates new insightful columns like combat_total, bmi, and speed_category.

  • Interactive Visualizations: Generates interactive plots using Plotly and saves them as HTML files.

Setup and Installation

To get this project running on your local machine, follow these steps.

Clone the repository:

git clone <your-repository-url>
cd pokemon_analysis_project

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required dependencies: The project dependencies are listed in requirements.txt.

#requirements.txt
pandas
requests
tqdm
plotly
ipython

Install them using pip:

pip install -r requirements.txt

How to Run

There are two main ways to use this project: run the full pipeline or perform a standalone analysis in a notebook.

1. Run the Full Data Pipeline

This is the primary way to run the project. It will execute all steps from data fetching to plot generation.

From the root directory (pokemon_analysis_project/), run the main.py script:

python pokemon_analyzer/main.py

After the pipeline finishes, you will find:

  • .csv files for each data stage in the /data directory.
  • A detailed log file in the /logs directory.
  • Interactive .html plots in the /plots directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published