This repository contains a collection of 8 Machine Learning assignments completed as part of coursework. Each assignment includes hands-on implementation of ML algorithms, reinforcing both theoretical knowledge and practical programming skills using Python.
Objective: Introduction to numerical computing using NumPy
Key Tasks:
- Array creation, slicing, reshaping, flattening
- Matrix operations (addition, multiplication, inverse, determinant, eigenvalues)
- Statistical measures (mean, median, SD, covariance, percentiles)
- Image-to-array conversion & file handling
Objective: Clean and transform raw data for ML models
Key Tasks:
- Handling missing values & noise removal
- Normalization & standardization
- Binning & discretization
- One-hot encoding, ordinal encoding
- Similarity & correlation metrics (Jaccard, Cosine, Pearson, Simple Matching) Dataset: Bike Buyers Dataset (synthetic equivalent)
Objective: Compare analytical & iterative regression training
Key Tasks:
- Linear Regression using Normal Equation + Gradient Descent
- 5-Fold Cross Validation evaluation
- Model performance comparison via R² score
- PCA for dimensionality reduction (before vs after comparison)
Objective: Collect real-world structured data
Key Tasks:
- Static scraping using BeautifulSoup
- Dynamic scraping using Selenium
- Extracted data from:
- BooksToScrape
- IMDb Top 250 Movies
- TimeAndDate Global weather reports
- Export to CSV for analysis
Objective: Regularized regression & model selection
Key Tasks:
- Ridge Regression using Gradient Descent (with tuning of α & LR)
- Linear vs Ridge vs Lasso comparison
- RidgeCV & LassoCV on Boston Dataset
- Hitters Dataset regression evaluation and best model justification
Objective: Bayesian modeling and hyperparameter tuning
Key Tasks:
- Gaussian Naive Bayes – manual & in-built implementation
- GridSearchCV for best K in KNN
Objective: Classification using different SVM kernels
Key Tasks:
- SVC with Linear / Polynomial / RBF kernels
- Metrics: Accuracy, Precision, Recall, F1-score
- Confusion Matrix visualization
- Effect of feature scaling on SVM performance
Objective: Boosting for stronger ensemble models
Parts Implemented:
- SMS Spam Classification
- TF-IDF vectorization + manual AdaBoost (T=15) + sklearn AdaBoost
- Heart Disease Prediction
- UCI Heart dataset with hypertuning of estimators & learning rate
- WISDM Smartphone & Watch Motion Sensor Dataset
- Accelerometer windowing, feature extraction, manual AdaBoost vs sklearn AdaBoost
- Hands-on implementation of ML algorithms (regression, boosting, SVM, Naïve Bayes)
- Understanding preprocessing, regularization & model evaluation
- Experience with sensor, medical & text datasets
- Practical ML pipeline skills (EDA → preprocessing → model → metrics)
- Data scraping & automation using Python
- Python 3.x
- NumPy, Pandas
- Scikit-learn
- Matplotlib & Seaborn
- BeautifulSoup, Selenium
- Jupyter / Spyder IDE
Mehak
B.Tech – Computer Engineering (3rd Year)
Thapar Institute of Engineering and Technology
📧 mmehak2_be23@thapar.edu
This repository is part of the Machine Learning (UML501) coursework under the guidance of faculty at TIET, Patiala.