Skip to content

This repo consist of my project in Webel Fujisoft which is majorly on the data analysis of a Superstore Dataset and it is done with the help of Python with Google Colab.

License

Notifications You must be signed in to change notification settings

Liveinwar/Superstore-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sample Superstore Data Analysis

A comprehensive data analysis project examining sales, profit, and business performance across different regions, segments, and product categories using the Sample Superstore dataset.

πŸ“‹ Table of Contents

  1. Background and Overview
  2. Data Structure Overview
  3. Executive Summary
  4. Insights Deep Dive
  5. Recommendations

1. Background and Overview

Project Objective

This project analyzes the Sample Superstore dataset to understand business performance across different dimensions including regions, customer segments, and product categories. The analysis focuses on identifying key drivers of sales and profitability to inform strategic business decisions.

Key Questions Addressed

  • Which regions and segments generate the highest sales and profits?
  • What is the relationship between sales, profit, quantity, and discounts?
  • Which product categories perform best?
  • How can discount strategies be optimized?
  • What are the key outliers and patterns in the data?

Tools and Technologies

  • Python: Primary programming language
  • Pandas: Data manipulation and analysis
  • Matplotlib & Seaborn: Data visualization
  • Google Colab: Development environment
  • Statistical Analysis: Correlation analysis, outlier detection

2. Data Structure Overview

Dataset Information

  • Source: Sample Superstore dataset (SampleSuperstore.csv)
  • Records: 9,994 transactions (after cleaning)
  • Features: 21 columns including sales metrics, geographic data, and product information

Key Columns

Column Description Data Type
Sales Revenue generated from transactions Float
Profit Profit earned from transactions Float
Quantity Number of items sold Integer
Discount Discount percentage applied Float
Region Geographic region (Central, East, South, West) String
Segment Customer segment (Consumer, Corporate, Home Office) String
Category Product category (Furniture, Office Supplies, Technology) String
Sub-Category Detailed product classification String
Location Combined City and State information String

Data Preprocessing Steps

  1. Missing Value Treatment: Removed rows with null values
  2. Duplicate Removal: Eliminated duplicate records
  3. Data Type Conversion: Converted Postal Code to integer format
  4. Feature Engineering: Created Location column by combining City and State
  5. Data Cleaning: Ensured data consistency across all columns

3. Executive Summary

Overall Business Metrics

  • Total Sales: $2,297,200.86
  • Total Profit: $286,397.02
  • Average Discount: 15.6%
  • Average Profit Margin: 12.5%

Regional Performance Ranking

  1. West Region: $725,458 sales, $108,418 profit (Best performing)
  2. East Region: $678,781 sales, $91,523 profit
  3. Central Region: $501,240 sales, $39,706 profit
  4. South Region: $391,722 sales, $46,749 profit (Lowest performing)

Segment Analysis

  • Consumer Segment: Highest performer across all regions
  • Corporate Segment: Strong performance, especially in West region
  • Home Office Segment: Lowest contributor to overall sales and profit

Category Performance

  1. Technology: Leading category in sales volume
  2. Furniture: Second highest sales, important for B2B
  3. Office Supplies: Lowest sales but potentially high volume

4. Insights Deep Dive

Correlation Analysis Findings

Key Correlations Discovered

  • Sales vs Profit: 0.48 (Moderate positive correlation)

    • Higher sales generally lead to higher profits, but relationship isn't perfectly linear
  • Discount vs Profit: -0.22 (Weak negative correlation)

    • Increased discounts tend to reduce profitability
  • Quantity vs Sales: 0.20 (Weak positive correlation)

    • More items sold correlates with higher sales, but other factors influence this relationship

Statistical Insights

Outlier Analysis

  • Sales Distribution: Right-skewed with significant outliers
  • High-Value Transactions: Some sales transactions significantly exceed typical ranges
  • Profit Variability: Wide range including negative profits in some cases

Business Performance Patterns

  • Regional Concentration: West and East regions drive majority of business
  • Segment Dependency: Heavy reliance on Consumer segment across all regions
  • Seasonal/Product Variations: Significant variation in profit margins across different products

Visualization Insights

Sales Distribution

  • Most transactions fall in lower sales ranges
  • Few high-value transactions drive significant revenue
  • Clear regional preferences and market penetration differences

Profit Analysis

  • Strong correlation between sales and profit with notable exceptions
  • Some high-sales transactions show low or negative profits
  • Discount impact clearly visible in profit margins

5. Recommendations

Strategic Recommendations

1. Regional Strategy Optimization

  • Focus on West Region: Leverage success factors for expansion
  • South Region Development: Investigate barriers and develop targeted strategies
  • East Region Growth: Build on existing momentum with increased investment

2. Segment-Focused Initiatives

  • Consumer Segment: Maintain leadership through loyalty programs and targeted marketing
  • Corporate Segment: Expand B2B relationships, especially in high-performing regions
  • Home Office Segment: Develop specialized products or consider strategic pivoting

3. Discount Strategy Refinement

  • Data-Driven Discounting: Use correlation insights to optimize discount levels
  • Profit Protection: Implement discount caps to maintain profitability
  • Segment-Specific Discounts: Tailor discount strategies by customer segment

4. Product Portfolio Management

  • Technology Category: Maintain leadership through innovation and inventory management
  • Office Supplies: Explore bundling strategies to increase average transaction value
  • Furniture: Leverage B2B relationships for larger orders

5. Operational Improvements

  • Outlier Management: Investigate high-value transactions for replication strategies
  • Cost Optimization: Address cases of negative profit through cost analysis
  • Performance Monitoring: Implement regular correlation analysis for ongoing optimization

Implementation Priorities

Short-term (1-3 months)

  1. Implement refined discount policies
  2. Launch South region market research
  3. Develop Consumer segment retention programs

Medium-term (3-6 months)

  1. Expand successful West region strategies to other areas
  2. Enhance Corporate segment sales processes
  3. Optimize product mix based on profitability analysis

Long-term (6-12 months)

  1. Regional expansion based on learnings
  2. Advanced analytics implementation for real-time insights
  3. Strategic repositioning of underperforming segments

πŸš€ Getting Started

Prerequisites

pip install pandas matplotlib seaborn numpy

Running the Analysis

  1. Clone this repository
  2. Upload the SampleSuperstore.csv file
  3. Run the Python script in Google Colab or Jupyter Notebook
  4. Review generated visualizations and insights

File Structure

project/
β”‚
β”œβ”€β”€ project_week_1_trinjan_dutta.py    # Main analysis script
β”œβ”€β”€ SampleSuperstore.csv               # Dataset
β”œβ”€β”€ README.md                          # This file
└── visualizations/                    # Generated charts and plots

πŸ“Š Key Visualizations Generated

  • Correlation Matrix Heatmap
  • Regional Sales Pie Chart
  • Sales vs Profit Scatter Plot
  • Box Plot for Outlier Detection
  • Pivot Table Analysis
  • Regional Performance Comparison

🀝 Contributing

Feel free to fork this project and submit pull requests for improvements. Areas for enhancement include:

  • Advanced statistical analysis
  • Machine learning predictions
  • Interactive dashboards
  • Additional visualization types

πŸ“ License

This project is open source and available under the MIT License.


πŸ‘€ Author

Trinjan Dutta


This analysis provides actionable insights for data-driven business decisions in retail and e-commerce environments.

About

This repo consist of my project in Webel Fujisoft which is majorly on the data analysis of a Superstore Dataset and it is done with the help of Python with Google Colab.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published