🍕 ABCDEats Inc: A Data Mining Approach to Customer Segmentation 📦

📝 Description

This repository documents a comprehensive Data Mining project focused on ABCDEats Inc., a fictional food delivery service. We analyze a rich dataset of customer transactions and behaviors to develop a data-driven segmentation strategy. The goal is to empower ABCDEats to move beyond a one-size-fits-all approach and tailor its marketing, promotions, and service offerings to distinct customer profiles.

✨ Objective

The primary objectives of this project are to:

Conduct an Exploratory Data Analysis (EDA) to understand customer behaviors, trends, and patterns.
Preprocess the data, handling inconsistencies, missing values, outliers, and perform feature engineering/selection.
Apply and evaluate various Clustering Algorithms (Hierarchical, K-Means, SOM, Density-based) from different perspectives (Overall, Value-based, Behavior-based).
Develop a Final Customer Segmentation solution by comparing and potentially merging results from different approaches.
Profile the resulting customer segments, highlighting their key characteristics.
Suggest actionable Business Applications and marketing strategies for each segment.
(Optional) Develop an interactive Web Application for exploring the EDA and segmentation results.

🎓 Project Context

This project was developed for the Data Mining course as part of the Master's in Data Science and Advanced Analytics program at NOVA IMS. The work was completed during the 1st Semester of the 2024/2025 academic year.

🛠️ Technologies & Libraries

The project was implemented entirely in Python, leveraging a powerful stack of libraries for data science, machine learning, and web deployment.

🗺️ Project Workflow (CRISP-DM)

The project strictly followed the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology. The overall workflow is visualized below:

(Diagram summarizing the key phases and steps of the project)

🏗️ Project Structure (CRISP-DM Phases)

Business Understanding: 💡
- Defined the core business problem: Need for effective customer segmentation for ABCDEats Inc. to personalize marketing and services.
- Established project objectives aligned with business goals (improve customer satisfaction, retention, revenue).
Data Understanding: 🔍
- Explored the initial dataset (31,888 customers, 56 features).
- Identified data types, distributions (skewness, kurtosis), and initial relationships (pair plots).
- Detected missing values (customer_age, first_order, HR_0), duplicates, and inconsistencies ('-' in customer_region, last_promo; illogical vendor/product/order_counts).
Data Preparation & Feature Engineering 🛠️
- Cleaning: Handled duplicates (removed 13), treated inconsistencies (removed 18 illogical rows, reinterpreted '-').
- Missing Value Imputation: Used deterministic logic (first_order, HR_0) and KNNImputer (customer_age).
- Feature Engineering: Created new features (e.g., order_count, days_between_orders, customer_region_buckets, last_promo_bin, CUI totals/averages/most spent, PCA components). Discarded less informative engineered features (e.g., CUI proportions).
- Outlier Handling: Applied a mixed strategy (modified IQR and manual removal based on boxplots/domain knowledge), retaining 98.61% of data.
- Variable Selection: Used Spearman correlation (threshold 0.8) to identify and remove redundant features (vendor_count, product_count, days_between_orders, customer_age, customer_age_group).
- Feature Scaling: Applied StandardScaler to numerical features for distance-based algorithms.
- Dimensionality Reduction: Used PCA separately on CUI and HR feature groups to reduce noise/redundancy while preserving variance (kept 7 CUI PCs, 4 HR PCs). Original DOW variables were retained.
Modeling: Multi-Perspective Clustering 🧠
- Applied multiple clustering algorithms:
  - Hierarchical Clustering (HC - Agglomerative, Ward linkage)
  - K-Means
  - Self-Organizing Maps (SOM - using MiniSom) + HC/K-Means
  - Density-Based: Mean Shift, DBSCAN, Gaussian Mixture Models (GMM)
- Performed clustering on 'Overall', 'Value-based', and 'Behavior-based' feature subsets.
Evaluation & Final Segmentation ✅
- Determined optimal cluster numbers using Elbow method (Inertia/SSE), Silhouette analysis, R² metric (for HC), AIC/BIC (for GMM), and visual inspection (dendrograms).
- Compared performance across algorithms and perspectives based on R² and silhouette scores.
- Selected best-performing methods for each perspective (SOM+K-Means overall, K-Means value, SOM+K-Means behavior).
- Manually merged the 'Value' (k=3) and 'Behavior' (k=4) solutions based on centroid analysis to create a final, more robust 5-cluster solution.
- Visualized cluster separation using t-SNE and UMAP.
Deployment 🚀
- Profiling: Characterized the final 5 clusters using descriptive statistics, bar plots, and heatmaps.
- Business Applications: Defined marketing strategies tailored to each segment.
- (Optional) Interactive Dashboard: Developed a web application using Streamlit and Plotly for dynamic exploration of EDA and segmentation results. Access the App Here!
  - ➡️ Dashboard App Repository: Silvestre17/DM_Dashboard ⬅️

📈 Results - Final Customer Segments

Based on the merged clustering solution (Value K-Means + Behavior SOM+K-Means), five distinct customer segments were identified:

Segment ID	Segment Name	Key Characteristics	Recommended Marketing Approach
0	The Mainstream Base	- Largest group (41.74%). - Average spending & behavior, similar to overall dataset. - Moderate to low engagement. - Prefers Asian & American cuisines. - Balanced across regions; uses card payments.	- Offer tiered loyalty (discounts/perks for higher spending/frequency). - Target promotions for American/Asian cuisines & combo deals.
1	The Promo Pursuers	- Second largest (38.00%), low engagement (lowest order count). - Low total spend, but high average spend per order. - Likely motivated by delivery promotions. - Slight preference for evening orders & Noodles/Chinese/Chicken.	- Offer free delivery for orders above a certain value. - Implement points-based rewards program redeemable for discounts/free delivery.
2	The Convenience Seekers	- Concentrated in Region 2 (8.56%). - High order frequency (lunch/dinner). - Prefers Chicken, Chinese, Noodles, Other; less Asian/Street Food. - Moderate spenders, but significant volume.	- Focus on premium dining experience (personalized service), especially in Region 2. - Offer exclusive menu previews/early access. - Loyalty program rewarding spend per order & frequency.
3	The Balanced Spenders	- Located mostly in Region 2 & 4 (6.76%). - Similar activity times to Cluster 2 (lunch/dinner) but lower frequency/spend. - Prefers Italian & Other cuisines; less keen on Street Food/Snacks/Asian.	- Highlight Italian/Other cuisines in promotions (exclusive deals). - Target lunch/dinner promotions. - Offer discount combos for higher spend.
4	The Late-Night Enthusiasts	- Highest spenders (absolute & average) (4.93%). - Predominantly in Region 8. - Strong preference for Asian, Snack, Street Food. - Orders primarily late night & early breakfast. - Less preference for Italian/Other.	- Highlight breakfast & late-night specific items/offerings. - Introduce city-specific promotions (Region 8). - Offer special discounts/VIP access for high spenders.

👥 Group 37 Members

André Silvestre, 20240502
Filipa Pereira, 20240509
Umeima Mahomed, 20240543

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Clustering_Outputs		Clustering_Outputs
EDA_Outputs		EDA_Outputs
Preprocessing_Outputs		Preprocessing_Outputs
data		data
.gitignore		.gitignore
DM2425_Part1_37.ipynb		DM2425_Part1_37.ipynb
DM2425_Part2_37_01.ipynb		DM2425_Part2_37_01.ipynb
DM2425_Part2_37_02.ipynb		DM2425_Part2_37_02.ipynb
DM2425_Part2_37_03.ipynb		DM2425_Part2_37_03.ipynb
DM2425_Part2_37_04.ipynb		DM2425_Part2_37_04.ipynb
DM2425_Part2_37_05.ipynb		DM2425_Part2_37_05.ipynb
DM2425_Part2_37_06.ipynb		DM2425_Part2_37_06.ipynb
DM2425_ProjectGuidelines.pdf		DM2425_ProjectGuidelines.pdf
DMProjectFlowchart.png		DMProjectFlowchart.png
DM_2425_OptionalPart_37.pdf		DM_2425_OptionalPart_37.pdf
DM_2425_Part1_37.pdf		DM_2425_Part1_37.pdf
DM_2425_Part2_37.pdf		DM_2425_Part2_37.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍕 ABCDEats Inc: A Data Mining Approach to Customer Segmentation 📦

📝 Description

✨ Objective

🎓 Project Context

🛠️ Technologies & Libraries

🗺️ Project Workflow (CRISP-DM)

🏗️ Project Structure (CRISP-DM Phases)

📈 Results - Final Customer Segments

👥 Group 37 Members

About

Uh oh!

Languages

License

Silvestre17/DM_FoodDeliveryClustering_MasterProject

Folders and files

Latest commit

History

Repository files navigation

🍕 ABCDEats Inc: A Data Mining Approach to Customer Segmentation 📦

📝 Description

✨ Objective

🎓 Project Context

🛠️ Technologies & Libraries

🗺️ Project Workflow (CRISP-DM)

🏗️ Project Structure (CRISP-DM Phases)

📈 Results - Final Customer Segments

👥 Group 37 Members

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages