K-Means Clustering - Mall Customer Segmentation

Overview

This project implements the K-Means clustering algorithm to segment mall customers based on their Age, Annual Income, and Spending Score. The goal is to group customers into clusters that exhibit similar characteristics for targeted marketing or customer segmentation strategies.

Dataset:

The dataset used in this project is included in the repository. It contains customer data with the following columns:

CustomerID: Unique identifier for each customer.
Gender: The gender of the customer.
Age: The age of the customer.
Annual Income (k$): The annual income of the customer in thousands of dollars.
Spending Score (1-100): A score assigned to customers based on their spending behavior.
You can directly use this dataset to replicate the customer segmentation process.

Installation

Clone the Repository:

git clone https://github.com/Vignesha-S/Project-3.git

Install Required Libraries: If you have a requirements.txt file, you can install the dependencies with:
```
pip install -r requirements.txt
```
Otherwise, manually install the necessary libraries:
```
pip install pandas matplotlib seaborn scikit-learn
```

How to Run

Clone the Repository:

git clone https://github.com/Vignesha-S/Project-3.git

Install Required Libraries:
```
pip install -r requirements.txt
```
Run the Jupyter Notebook: Open and run the Jupyter Notebook (k_means_clustering.ipynb) using Jupyter Notebook or JupyterLab:
```
jupyter notebook k_means_clustering.ipynb
```

Project Structure

k_means_clustering.ipynb: The main Jupyter notebook file that contains all steps, including data loading, exploration, preprocessing, clustering, and visualization.
data/: Folder containing the dataset (Mall_Customers.csv).
requirements.txt: A file containing the list of dependencies for the project.

Project Steps

Data Loading and Initial Exploration:
- The dataset is loaded and inspected for its structure.
- Features such as Age, Annual Income, and Spending Score are selected for clustering.
Exploratory Data Analysis (EDA):
- Histograms and KDE plots are used to understand the distribution of features (Age and Spending Score).
- A correlation heatmap is generated to check the relationships between the features.
Data Preprocessing:
- The data is scaled using StandardScaler to normalize the features and prepare them for clustering.
K-Means Clustering:
- The Elbow Method is applied to determine the optimal number of clusters, which is found to be 4.
- The K-Means model is trained on the scaled data to segment the customers into 4 clusters.
Centroids Visualization:
- After clustering, the centroids of the clusters are visualized in a 3D plot along with the customer data points.
Cluster Summary and Interpretation:
- A summary of each cluster is generated by calculating the mean values of Age, Annual Income, and Spending Score within each cluster.
- Insights such as high-income vs. low-income groups, and high-spending vs. low-spending groups are provided.
Final Visualization:
- A final 2D plot is generated to visualize the clusters and their centroids, helping to understand the customer segments in the feature space.

Results

The K-Means algorithm successfully segmented the customers into 4 clusters based on their Age, Annual Income, and Spending Score.
The clusters represent different customer segments that can be targeted for marketing strategies, such as:
- Cluster 0: High-income and low-spending group.
- Cluster 1: High-income and high-spending group.
- Cluster 2: Low-income and medium-spending group.
- Cluster 3: Medium-income and low-spending group.

License:

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md
k_means_clustering.ipynb		k_means_clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means Clustering - Mall Customer Segmentation

Overview

Dataset:

Table of Contents

Installation

How to Run

Project Structure

Project Steps

Results

License:

About

Uh oh!

Releases

Packages

Languages

License

Vignesha-S/Project-3

Folders and files

Latest commit

History

Repository files navigation

K-Means Clustering - Mall Customer Segmentation

Overview

Dataset:

Table of Contents

Installation

How to Run

Project Structure

Project Steps

Results

License:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages