This repository contains a machine learning pipeline to detect user activity based on accelerometer and gyroscope data using multiple models and techniques.
The goal of this project is to classify user activities from mobile health sensor data. Activities are predicted using various machine learning models, and the performance of each model is evaluated for accuracy, precision, recall, and other metrics.
The dataset used in this project is sourced from Kaggle: Mobile Health Dataset.
-
Clone the repository and install dependencies:
git clone https://github.com/marksamfd/mhealth-classification.git cd mhealth-classification -
Set up your Kaggle credentials to download the dataset:
export KAGGLE_USERNAME=your_kaggle_username export KAGGLE_KEY=your_kaggle_key
-
Extract the dataset and ensure it is in the same directory as the script.
- Data balancing ensures equal representation of all activities.
- Visualizations for sensor distributions are generated using Seaborn.
- One-hot encoding is applied for categorical features.
-
Linear Regression
- Model evaluates with Mean Squared Error (MSE).
-
Logistic Regression
- Simple classification model with accuracy and confusion matrix.
-
K-Nearest Neighbors (KNN)
- Hyperparameters tuned with GridSearchCV.
-
Support Vector Machine (SVM)
- Radial basis kernel with hyperparameter tuning.
-
Neural Networks
- Built with TensorFlow and trained using sparse categorical cross-entropy.
- TensorBoard used for logging and visualization.
- Confusion matrices are generated for both train and test datasets.
- Metrics like accuracy, precision, and recall are calculated for comparison.
- Linear Regression: Poor performance (MSE = 10).
- Logistic Regression: Moderate accuracy (54%).
- KNN: High accuracy (86%).
- SVM: High accuracy (85%).
- Neural Networks: Best performing model with training accuracy of 90%.
Run the code cells in sequence from the provided notebook or script. Ensure all dependencies are installed and the dataset is correctly placed.
The project includes visualization of sensor data distribution and confusion matrices for detailed analysis.
- Enhance model performance with feature engineering.
- Implement additional deep learning architectures.
- Test models on real-time data streams.
Contributions are welcome! Please open issues or submit pull requests with improvements.
This project is licensed under the MIT License. See the LICENSE file for more details.