This project was made to demo to new coders how a notebooks would be split up to be used in various python files. Showing a more production ready code base rather than a presentation based set of code.
This project demonstrates a complete, modular data analysis pipeline built in Python. It fetches data from the public PokéAPI, cleans it, engineers new features, generates visualizations, and provides tools for analysis. The entire process is structured to mimic a real-world production environment, emphasizing code separation, maintainability, and logging.
The project is organized into a clean and logical file tree to separate concerns.
pokemon_analysis_project/
├── production-notebook.ipynb # A notebook the breaks everything down
├── pokemon_analyzer/
│ ├── __init__.py # Makes the directory a Python package
│ ├── get_data.py # Fetches data from the API
│ ├── clean_data.py # Cleans and preprocesses raw data
│ ├── feature_engineering.py # Creates new data features
│ ├── create_plots.py # Generates visualizations
│ └── main.py # Main pipeline orchestrator
│
├── data/ # Output directory for CSV files (auto-generated)
├── logs/ # Output directory for log files (auto-generated)
├── plots/ # Output directory for HTML plots (auto-generated)
│
└── requirements.txt # Project dependencies-
Modular Pipeline: Each step of the data process (fetch, clean, feature engineer, plot) is in its own script.
-
API Data Fetching: Connects to the PokéAPI to get data for the first 151 Pokémon.
-
Robust Logging: Creates a data_fetch.log file to record the outcome of every API call, perfect for debugging.
-
User-Friendly CLI: A tqdm progress bar shows the status of the data fetching process in the terminal.
-
Data Cleaning: Handles missing values and converts data into standard units (e.g., kg, m).
-
Feature Engineering: Creates new insightful columns like combat_total, bmi, and speed_category.
-
Interactive Visualizations: Generates interactive plots using Plotly and saves them as HTML files.
To get this project running on your local machine, follow these steps.
Clone the repository:
git clone <your-repository-url>
cd pokemon_analysis_projectCreate a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`Install the required dependencies: The project dependencies are listed in requirements.txt.
#requirements.txt
pandas
requests
tqdm
plotly
ipython
Install them using pip:
pip install -r requirements.txtThere are two main ways to use this project: run the full pipeline or perform a standalone analysis in a notebook.
This is the primary way to run the project. It will execute all steps from data fetching to plot generation.
From the root directory (pokemon_analysis_project/), run the main.py script:
python pokemon_analyzer/main.pyAfter the pipeline finishes, you will find:
- .csv files for each data stage in the /data directory.
- A detailed log file in the /logs directory.
- Interactive .html plots in the /plots directory.