Skip to content

mitrapinaki/Practical-Machine-Learning-Cookbook

Repository files navigation

ML Fundamentals Lab

📚 Overview This repository contains a collection of beginner‑friendly, intuitive, real‑life‑based machine learning labs built using NumPy, Pandas, and scikit‑learn. Each .py file focuses on one core ML concept with simple examples and clean code.

These labs help you build a strong foundation in:

Train/test splitting

Linear regression

Logistic regression

Feature scaling

Pipelines

Model evaluation

Real‑life ML workflows (housing prices, student pass/fail, pizza delivery time, fraud detection)

🛠️ Installation

  1. Create and activate a virtual environment bash python3 -m venv venv source venv/bin/activate
  2. Install required packages bash pip install numpy pandas scikit-learn (Optional but recommended)

bash pip install jupyter matplotlib ▶️ Running the Files Each file is standalone. Run any script using:

bash python filename.py Example:

bash python 01_train_test_split.py 📂 File‑by‑File Breakdown Below is a clear explanation of every .py file in this lab.

01_train_test_split.py Concept: How to split data into training and testing sets. Why it matters: Prevents overfitting and ensures fair evaluation. Dataset: Iris Key skills: train_test_split, ML workflow basics.

02_housing_prices_linear_regression.py Concept: Linear regression using a real‑life example (predicting house prices). Why it matters: Teaches how models learn numeric relationships. Key skills: slope, intercept, prediction, regression intuition.

02_linear_regression.py Concept: Linear regression using scikit‑learn’s diabetes dataset. Why it matters: Introduces coefficients, intercept, and R² score. Key skills: model fitting, evaluation, regression fundamentals.

03_01_student_pass_fail.py Concept: Logistic regression with a real‑life example (predicting if a student passes). Why it matters: Shows classification vs regression clearly. Key skills: probabilities, decision boundaries, classification intuition.

03_logistic_regression.py Concept: Logistic regression on the Iris dataset. Why it matters: Introduces multi‑class classification. Key skills: accuracy, training vs testing performance.

04_standard_scaler.py Concept: Feature scaling using StandardScaler. Why it matters: Many ML models require normalized data. Key skills: standardization, preprocessing, mean=0, std=1.

05_pipeline_basics.py Concept: Building a simple ML pipeline (scaling + model). Why it matters: Real ML systems always use pipelines. Key skills: Pipeline, preprocessing, clean workflows.

05_pizza_delivery_time_pipeline.py Concept: Real‑life pipeline example (predicting pizza delivery time). Why it matters: Shows how numeric + categorical preprocessing works. Key skills: ColumnTransformer, OneHotEncoder, regression pipeline.

06_model_evaluation.py Concept: Evaluating classification models using precision, recall, F1. Why it matters: Accuracy is not enough — real ML requires deeper metrics. Key skills: classification_report, model evaluation.

07_fraud_detection_pipeline.py Concept: Full ML pipeline for fraud detection. Why it matters: Fraud detection is imbalanced and requires careful evaluation. Key skills:

numeric + categorical preprocessing

logistic regression

precision/recall/F1

real‑life ML workflow

🧠 Recommended Learning Order 01_train_test_split

02_linear_regression

02_housing_prices_linear_regression

03_logistic_regression

03_01_student_pass_fail

04_standard_scaler

05_pipeline_basics

05_pizza_delivery_time_pipeline

06_model_evaluation

07_fraud_detection_pipeline

This order builds intuition step‑by‑step.

About

Practical Machine Learning Cookbook - From Basics to Real‑World Pipelines

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages