This repository contains a machine learning project for credit risk prediction using the UCI Default of Credit Card Clients dataset. The model predicts whether a client will default on their credit card payment based on their demographic, payment history, and bill statement data.
Credit risk assessment is crucial for financial institutions to minimize losses. This project utilizes a Random Forest Classifier to predict the likelihood of a client defaulting, with results evaluated using metrics like Accuracy, ROC AUC Score, and Classification Report.
The dataset used is sourced from the UCI Machine Learning Repository and contains:
- 30,000 samples of credit card clients.
- 23 features including:
- Demographic information:
SEX,AGE,EDUCATION,MARRIAGE - Payment history:
PAY_0toPAY_6 - Bill statements:
BILL_AMT1toBILL_AMT6 - Payment amounts:
PAY_AMT1toPAY_AMT6
- Demographic information:
- Target variable:
default(1 = Default, 0 = No Default)
-
Data Preprocessing:
- Filling missing values with column means.
- Standardizing numeric features using
StandardScaler. - Encoding categorical variables using
LabelEncoder.
-
Class Balance Check:
- The dataset has an equal distribution of
DefaultandNo Defaultclasses (4673 samples each), ensuring no need for resampling techniques.
- The dataset has an equal distribution of
-
Model Training:
- A Random Forest Classifier is trained.
- Hyperparameter tuning performed using
GridSearchCV.
-
Model Evaluation:
- Accuracy: 85.4%
- ROC AUC Score: 0.924
- Detailed Classification Report and Confusion Matrix are generated.
-
Feature Importance:
- The top predictors of credit default are identified, including
LIMIT_BAL,PAY_0, andBILL_AMTfeatures.
- The top predictors of credit default are identified, including
| Metric | Value |
|---|---|
| Accuracy | 85.4% |
| ROC AUC | 0.924 |
| Precision | 0.85–0.86 |
| Recall | 0.85–0.86 |
The confusion matrix highlights the prediction performance for both classes:
| Actual/Predicted | No Default | Default |
|---|---|---|
| No Default | 4024 | 649 |
| Default | 711 | 3962 |
To run this project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/<YourUsername>/<RepoName>.git cd <RepoName>
-
Install Dependencies: Install the required Python libraries using
pip:pip install -r requirements.txt
-
Run the Jupyter Notebook: Open the Jupyter Notebook to explore the code:
jupyter notebook
- Python 3.8+
- Libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- imbalanced-learn (if SMOTE is applied in future versions)
- Compare performance with other models like XGBoost and LightGBM.
- Deploy the model as an API for real-time predictions.
- Add visualization dashboards for better insights.
Contributions are welcome! Feel free to fork the repository, create a new branch, and submit a pull request.
This project is licensed under the MIT License.
- Jebin Larosh Jervis
- Connect with me: LinkedIn

