This project implements a machine learning model that predicts whether individuals have diabetes based on various health metrics. The model uses an ensemble approach combining Logistic Regression and KNN algorithms for improved prediction accuracy.
- Assist in early detection of diabetes
- Generate meaningful predictions from health data
- Compare performance of different machine learning models
Our dataset includes the following features:
- Number of Pregnancies
- Glucose Level
- Blood Pressure
- Skin Thickness
- Insulin
- BMI (Body Mass Index)
- Diabetes Pedigree Function
- Age
- Outcome (0: Non-Diabetic, 1: Diabetic)
The dataset used in this project is the Pima Indians Diabetes Database, originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The dataset is included in the repository as diabetes.csv and is also available on Kaggle.
- Source: Kaggle
- Format: CSV
- Size: 768 instances
- Target Population: Females of Pima Indian heritage, age 21+
- Features: 8 numeric predictive attributes and 1 target variable
- File:
diabetes.csv(included in repository)
- Python 3
- Pandas
- NumPy
- Scikit-learn
- Seaborn
- Matplotlib
- imbalanced-learn (SMOTE)
Our ensemble model (Logistic Regression + KNN) achieved the following results:
- Accuracy: 0.76
- Weighted F1-Score: 0.77
- Precision for Non-Diabetic cases: 0.86
- Precision for Diabetic cases: 0.63
- SMOTE technique for handling data imbalance
- Combination of two different algorithms
- Detailed performance metrics
- Visual analysis tools
- Install required libraries:
pip install pandas numpy scikit-learn seaborn matplotlib imbalanced-learn- Launch Jupyter Notebook
- Open
main.ipynb - Run all cells sequentially
- The ensemble model outperformed both standalone KNN (0.72) and Logistic Regression (0.75)
- Model shows high accuracy in detecting non-diabetic cases
- Room for improvement in detecting diabetic cases
- Successfully handles imbalanced dataset through SMOTE
- Combines strengths of both Logistic Regression and KNN
- Provides reliable predictions for healthcare screening
We welcome your contributions to improve the project! Feel free to:
- Fork the repository
- Create a feature branch
- Submit a pull request
For questions and suggestions:
- Open an issue
- Submit a pull request
- Connect through project discussions
This project is licensed under the MIT License.
⭐️ If you found this project helpful, don't forget to give it a star!