Skip to content

JamesANZ/machine-learning-exercises

Repository files navigation

Machine Learning Exercises: A Comprehensive Learning Journey

Welcome to a comprehensive collection of machine learning exercises designed to take you from basic concepts to advanced techniques. This repository contains hands-on examples, detailed explanations, and practical insights that will help you build a solid foundation in machine learning.

🎯 Learning Objectives

By working through these exercises, you will:

  • Master fundamental ML concepts through hands-on examples
  • Understand the complete ML pipeline from data preparation to model evaluation
  • Learn feature engineering principles and why they matter
  • Compare different algorithms and understand their strengths/weaknesses
  • Develop best practices for model evaluation and validation
  • Gain practical experience with real-world datasets and problems

📚 Exercise Overview

MLR1.py - Basic Decision Tree Classification

Learning Focus: Introduction to supervised learning and decision trees

  • What you'll learn: Basic ML workflow, feature engineering, training vs. prediction
  • Dataset: Simple fruit classification (apples vs. oranges)
  • Key concepts: Supervised learning, decision trees, feature selection
  • Real-world applications: Product classification, quality control, medical diagnosis

MLR2.py - Iris Flower Classification with Complete Workflow

Learning Focus: Complete ML pipeline with real-world dataset

  • What you'll learn: Data loading, train/test splitting, model evaluation, visualization
  • Dataset: Famous Iris flower dataset (150 samples, 4 features, 3 classes)
  • Key concepts: Train/test methodology, accuracy metrics, model interpretation
  • Real-world applications: Species identification, medical diagnosis, quality assessment

MLR3.py - Feature Engineering and Data Visualization

Learning Focus: The importance of feature selection and data visualization

  • What you'll learn: Feature overlap analysis, statistical distributions, visualization techniques
  • Dataset: Simulated dog height data (Greyhounds vs. Labrador Retrievers)
  • Key concepts: Feature discriminability, overlap analysis, visualization best practices
  • Real-world applications: Medical diagnosis, fraud detection, image recognition

MLR4.py - Advanced ML Techniques and Algorithm Comparison

Learning Focus: Advanced evaluation methods and algorithm comparison

  • What you'll learn: Cross-validation, multiple algorithms, feature scaling, comprehensive evaluation
  • Dataset: Wine classification dataset (178 samples, 13 features, 3 classes)
  • Key concepts: Cross-validation, algorithm comparison, hyperparameter tuning, model interpretation
  • Real-world applications: Quality control, recommendation systems, predictive analytics

🧠 Key Learning Concepts Covered

1. Supervised Learning Fundamentals

  • Classification vs. regression
  • Training and prediction phases
  • Labeled data and ground truth
  • Model generalization

2. Feature Engineering

  • Feature selection principles
  • Discriminative vs. redundant features
  • Feature scaling and normalization
  • Feature importance analysis

3. Algorithm Understanding

  • Decision Trees: Interpretable, no scaling needed
  • Random Forest: Ensemble method, robust performance
  • Support Vector Machines: Powerful, requires scaling
  • Naive Bayes: Simple, surprisingly effective
  • K-Nearest Neighbors: Instance-based, distance-sensitive

4. Model Evaluation

  • Train/test splitting
  • Cross-validation for robust evaluation
  • Accuracy, precision, recall, F1-score
  • Confusion matrices and classification reports

5. Data Visualization

  • Histograms and distribution analysis
  • Feature overlap visualization
  • Performance comparison charts
  • Model interpretation plots

🚀 Getting Started

Prerequisites

pip install numpy matplotlib scikit-learn pandas

Optional Dependencies (for visualization)

pip install pydot graphviz

Running the Exercises

# Basic decision tree example
python MLR1.py

# Complete ML workflow
python MLR2.py

# Feature engineering and visualization
python MLR3.py

# Advanced techniques and algorithm comparison
python MLR4.py

📖 Educational Notes

This repository includes comprehensive notes from the Holehouse Machine Learning course, covering:

  • Linear Regression: Cost functions, gradient descent, feature scaling
  • Classification: Logistic regression, decision boundaries, regularization
  • Neural Networks: Forward/backward propagation, activation functions
  • Unsupervised Learning: Clustering (K-means), dimensionality reduction (PCA)
  • Advanced Topics: SVM, recommender systems, anomaly detection

📚 Additional Resources

Original Course Material

🤝 Contributing

This repository is designed for learning and education. Feel free to:

  • Add new exercises and examples
  • Improve documentation and comments
  • Share your insights and discoveries
  • Suggest additional learning resources

📄 License

This educational content is provided under an open license for learning purposes. Please attribute appropriately when sharing or building upon this work.


Happy Learning! 🎉

Remember: The best way to learn machine learning is by doing. Start with the basic exercises and gradually work your way up to the advanced techniques. Each exercise builds upon the previous ones, creating a comprehensive learning experience.

About

Machine learning exercises

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages