This project builds a machine learning model to predict whether a customer will Churn or Not Churn based on demographic, account, and service usage data.
Project Highlights:
- Predicts customer churn using Random Forest.
- Handles class imbalance using SMOTE.
- Saves trained model, encoders, and scaler for making predictions on new customers.
- Tools: Python, Scikit-learn, NumPy, Pandas, Imbalanced-learn.
- Rows: ~7,000+ (telecom customer dataset)
- Features: Demographics, account info, service subscriptions, charges, tenure, contract type, payment method, etc.
- Label:
Churn(1 = Churn, 0 = Not Churn)
- Clone the repository:
git clone https://github.com/yourusername/customer-churn-prediction.git- Install required libraries:
pip install numpy pandas scikit-learn imbalanced-learn matplotlib seaborn- Run the notebook:
- Open
Customer_Churn_Prediction.ipynbin Jupyter Notebook or Google Colab and run all cells.
-
Data Loading: Load the dataset into a pandas DataFrame.
-
Data Preprocessing: Clean data, handle missing values, encode categorical variables, scale numeric features.
-
Exploratory Data Analysis: Visualize distributions and correlations.
-
Class Imbalance Handling: Use SMOTE to balance the dataset.
-
Train-Test Split: Split data into training and testing sets.
-
Model Training: Train Random Forest and XGBoost models with hyperparameter tuning.
-
Evaluation: Evaluate using accuracy, ROC-AUC, confusion matrix, and classification report.
-
Prediction Function: Predict churn for new customers using the saved model.
| Metric | Score |
|---|---|
| Training Accuracy | ~85% |
| Test Accuracy | ~82% |
| ROC-AUC Score | ~0.85 |
- The model demonstrates reliable performance in predicting customer churn.
Input:
example_input = {
'gender': 'Male',
'SeniorCitizen': 1,
'Partner': 'No',
'Dependents': 'No',
'tenure': 1,
'PhoneService': 'Yes',
'MultipleLines': 'Yes',
'InternetService': 'Fiber optic',
'OnlineSecurity': 'No',
'OnlineBackup': 'No',
'DeviceProtection': 'No',
'TechSupport': 'No',
'StreamingTV': 'Yes',
'StreamingMovies': 'Yes',
'Contract': 'Month-to-month',
'PaperlessBilling': 'Yes',
'PaymentMethod': 'Electronic check',
'MonthlyCharges': 99.65,
'TotalCharges': 99.65
}Prediction:
prediction, prob = make_prediction(example_input)
print(f"Prediction: {prediction}, Probability: {prob:.2f}")Prediction: Churn, Probability : 0.97
You can explore and run this project in Google Colab:
Open Customer Churn Prediction Notebook
- Built and trained a Random Forest model to predict customer churn.
- Achieved high accuracy on both training and test sets.
- Demonstrates a complete machine learning workflow: preprocessing, model training, evaluation, and prediction.
- Helps businesses proactively identify at-risk customers and take retention actions.