ML Fundamentals Lab
📚 Overview This repository contains a collection of beginner‑friendly, intuitive, real‑life‑based machine learning labs built using NumPy, Pandas, and scikit‑learn. Each .py file focuses on one core ML concept with simple examples and clean code.
These labs help you build a strong foundation in:
Train/test splitting
Linear regression
Logistic regression
Feature scaling
Pipelines
Model evaluation
Real‑life ML workflows (housing prices, student pass/fail, pizza delivery time, fraud detection)
🛠️ Installation
- Create and activate a virtual environment bash python3 -m venv venv source venv/bin/activate
- Install required packages bash pip install numpy pandas scikit-learn (Optional but recommended)
bash
pip install jupyter matplotlib
bash python filename.py Example:
bash python 01_train_test_split.py 📂 File‑by‑File Breakdown Below is a clear explanation of every .py file in this lab.
01_train_test_split.py Concept: How to split data into training and testing sets. Why it matters: Prevents overfitting and ensures fair evaluation. Dataset: Iris Key skills: train_test_split, ML workflow basics.
02_housing_prices_linear_regression.py Concept: Linear regression using a real‑life example (predicting house prices). Why it matters: Teaches how models learn numeric relationships. Key skills: slope, intercept, prediction, regression intuition.
02_linear_regression.py Concept: Linear regression using scikit‑learn’s diabetes dataset. Why it matters: Introduces coefficients, intercept, and R² score. Key skills: model fitting, evaluation, regression fundamentals.
03_01_student_pass_fail.py Concept: Logistic regression with a real‑life example (predicting if a student passes). Why it matters: Shows classification vs regression clearly. Key skills: probabilities, decision boundaries, classification intuition.
03_logistic_regression.py Concept: Logistic regression on the Iris dataset. Why it matters: Introduces multi‑class classification. Key skills: accuracy, training vs testing performance.
04_standard_scaler.py Concept: Feature scaling using StandardScaler. Why it matters: Many ML models require normalized data. Key skills: standardization, preprocessing, mean=0, std=1.
05_pipeline_basics.py Concept: Building a simple ML pipeline (scaling + model). Why it matters: Real ML systems always use pipelines. Key skills: Pipeline, preprocessing, clean workflows.
05_pizza_delivery_time_pipeline.py Concept: Real‑life pipeline example (predicting pizza delivery time). Why it matters: Shows how numeric + categorical preprocessing works. Key skills: ColumnTransformer, OneHotEncoder, regression pipeline.
06_model_evaluation.py Concept: Evaluating classification models using precision, recall, F1. Why it matters: Accuracy is not enough — real ML requires deeper metrics. Key skills: classification_report, model evaluation.
07_fraud_detection_pipeline.py Concept: Full ML pipeline for fraud detection. Why it matters: Fraud detection is imbalanced and requires careful evaluation. Key skills:
numeric + categorical preprocessing
logistic regression
precision/recall/F1
real‑life ML workflow
🧠 Recommended Learning Order 01_train_test_split
02_linear_regression
02_housing_prices_linear_regression
03_logistic_regression
03_01_student_pass_fail
04_standard_scaler
05_pipeline_basics
05_pizza_delivery_time_pipeline
06_model_evaluation
07_fraud_detection_pipeline
This order builds intuition step‑by‑step.