A comprehensive data analysis project examining sales, profit, and business performance across different regions, segments, and product categories using the Sample Superstore dataset.
- Background and Overview
- Data Structure Overview
- Executive Summary
- Insights Deep Dive
- Recommendations
This project analyzes the Sample Superstore dataset to understand business performance across different dimensions including regions, customer segments, and product categories. The analysis focuses on identifying key drivers of sales and profitability to inform strategic business decisions.
- Which regions and segments generate the highest sales and profits?
- What is the relationship between sales, profit, quantity, and discounts?
- Which product categories perform best?
- How can discount strategies be optimized?
- What are the key outliers and patterns in the data?
- Python: Primary programming language
- Pandas: Data manipulation and analysis
- Matplotlib & Seaborn: Data visualization
- Google Colab: Development environment
- Statistical Analysis: Correlation analysis, outlier detection
- Source: Sample Superstore dataset (SampleSuperstore.csv)
- Records: 9,994 transactions (after cleaning)
- Features: 21 columns including sales metrics, geographic data, and product information
| Column | Description | Data Type |
|---|---|---|
| Sales | Revenue generated from transactions | Float |
| Profit | Profit earned from transactions | Float |
| Quantity | Number of items sold | Integer |
| Discount | Discount percentage applied | Float |
| Region | Geographic region (Central, East, South, West) | String |
| Segment | Customer segment (Consumer, Corporate, Home Office) | String |
| Category | Product category (Furniture, Office Supplies, Technology) | String |
| Sub-Category | Detailed product classification | String |
| Location | Combined City and State information | String |
- Missing Value Treatment: Removed rows with null values
- Duplicate Removal: Eliminated duplicate records
- Data Type Conversion: Converted Postal Code to integer format
- Feature Engineering: Created Location column by combining City and State
- Data Cleaning: Ensured data consistency across all columns
- Total Sales: $2,297,200.86
- Total Profit: $286,397.02
- Average Discount: 15.6%
- Average Profit Margin: 12.5%
- West Region: $725,458 sales, $108,418 profit (Best performing)
- East Region: $678,781 sales, $91,523 profit
- Central Region: $501,240 sales, $39,706 profit
- South Region: $391,722 sales, $46,749 profit (Lowest performing)
- Consumer Segment: Highest performer across all regions
- Corporate Segment: Strong performance, especially in West region
- Home Office Segment: Lowest contributor to overall sales and profit
- Technology: Leading category in sales volume
- Furniture: Second highest sales, important for B2B
- Office Supplies: Lowest sales but potentially high volume
-
Sales vs Profit: 0.48 (Moderate positive correlation)
- Higher sales generally lead to higher profits, but relationship isn't perfectly linear
-
Discount vs Profit: -0.22 (Weak negative correlation)
- Increased discounts tend to reduce profitability
-
Quantity vs Sales: 0.20 (Weak positive correlation)
- More items sold correlates with higher sales, but other factors influence this relationship
- Sales Distribution: Right-skewed with significant outliers
- High-Value Transactions: Some sales transactions significantly exceed typical ranges
- Profit Variability: Wide range including negative profits in some cases
- Regional Concentration: West and East regions drive majority of business
- Segment Dependency: Heavy reliance on Consumer segment across all regions
- Seasonal/Product Variations: Significant variation in profit margins across different products
- Most transactions fall in lower sales ranges
- Few high-value transactions drive significant revenue
- Clear regional preferences and market penetration differences
- Strong correlation between sales and profit with notable exceptions
- Some high-sales transactions show low or negative profits
- Discount impact clearly visible in profit margins
- Focus on West Region: Leverage success factors for expansion
- South Region Development: Investigate barriers and develop targeted strategies
- East Region Growth: Build on existing momentum with increased investment
- Consumer Segment: Maintain leadership through loyalty programs and targeted marketing
- Corporate Segment: Expand B2B relationships, especially in high-performing regions
- Home Office Segment: Develop specialized products or consider strategic pivoting
- Data-Driven Discounting: Use correlation insights to optimize discount levels
- Profit Protection: Implement discount caps to maintain profitability
- Segment-Specific Discounts: Tailor discount strategies by customer segment
- Technology Category: Maintain leadership through innovation and inventory management
- Office Supplies: Explore bundling strategies to increase average transaction value
- Furniture: Leverage B2B relationships for larger orders
- Outlier Management: Investigate high-value transactions for replication strategies
- Cost Optimization: Address cases of negative profit through cost analysis
- Performance Monitoring: Implement regular correlation analysis for ongoing optimization
- Implement refined discount policies
- Launch South region market research
- Develop Consumer segment retention programs
- Expand successful West region strategies to other areas
- Enhance Corporate segment sales processes
- Optimize product mix based on profitability analysis
- Regional expansion based on learnings
- Advanced analytics implementation for real-time insights
- Strategic repositioning of underperforming segments
pip install pandas matplotlib seaborn numpy- Clone this repository
- Upload the
SampleSuperstore.csvfile - Run the Python script in Google Colab or Jupyter Notebook
- Review generated visualizations and insights
project/
β
βββ project_week_1_trinjan_dutta.py # Main analysis script
βββ SampleSuperstore.csv # Dataset
βββ README.md # This file
βββ visualizations/ # Generated charts and plots
- Correlation Matrix Heatmap
- Regional Sales Pie Chart
- Sales vs Profit Scatter Plot
- Box Plot for Outlier Detection
- Pivot Table Analysis
- Regional Performance Comparison
Feel free to fork this project and submit pull requests for improvements. Areas for enhancement include:
- Advanced statistical analysis
- Machine learning predictions
- Interactive dashboards
- Additional visualization types
This project is open source and available under the MIT License.
Trinjan Dutta
- GitHub: @trinjan-dutta
- LinkedIn: Trinjan Dutta
This analysis provides actionable insights for data-driven business decisions in retail and e-commerce environments.