This project aims to develop a predictive model for diagnosing diabetes based on health metrics. Leveraging machine learning techniques, the project uses the Pima Indians Diabetes dataset to identify patterns that correlate with the presence of diabetes.
📌 Objective The primary goal is to accurately classify whether a patient is likely to have diabetes using features such as glucose levels, insulin, BMI, age, and more. This model can support early intervention and data-driven healthcare decisions.
📊 Dataset The dataset contains several medical predictor variables and one target variable (Outcome) that indicates whether or not a patient has diabetes. Key features include: •Pregnancies •Glucose •BloodPressure •SkinThickness •Insulin •BMI •DiabetesPedigreeFunction •Age
🚀 Key Features •Data Preprocessing:Handled missing and zero values with domain-specific techniques. Scaled features to ensure consistent model input. •Exploratory Data Analysis (EDA):Visualized correlations, distributions, and outliers using Seaborn and Matplotlib. Investigated feature importance and interaction patterns. •Modeling Techniques:Logistic Regression. Performance evaluated using metrics like accuracy, precision, recall, F1-score, and confusion matrix.
âś… Results Achieved high classification accuracy with Random Forest and Logistic Regression models. Identified glucose and BMI as key predictive features.