Skip to content

Data-driven analysis of the Superstore sales dataset using Python and Power BI — uncovering insights on profitability, customer behavior, and regional performance.

Notifications You must be signed in to change notification settings

stevenkaranja/superstore_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Superstore Sales Analysis

This project analyzes the Superstore Sales Dataset to uncover business insights around sales, profit, and customer behavior.
The goal is to demonstrate how data analysis and visualization can guide business growth, efficiency, and decision-making.


Project Overview

  • Dataset: Superstore Sales (public dataset)
  • Tools: Python (Pandas, NumPy, Matplotlib, Seaborn), Power BI
  • Focus Areas:
    • Sales and profit trends across categories and regions
    • Customer segmentation and order patterns
    • Identifying high-margin vs. low-margin products
    • Impact of shipping modes on delivery efficiency
    • Forecasting sales patterns

Workflow

  1. Data Cleaning

    • Handled missing values and duplicates
    • Standardized date formats and categorical fields
  2. Exploratory Data Analysis (EDA)

    • Sales and profit distributions
    • Correlation between discounting and profitability
    • Regional and segment performance
  3. Visualization & Reporting

    • Power BI dashboards for interactive filtering and drill-downs
    • Python visualizations for deeper statistical analysis

Key Insights

  • Discounting increased sales volume but significantly reduced profit margins.
  • Technology segment generated the highest profitability, while Furniture lagged.
  • West region showed the strongest overall performance, with consistent growth trends.
  • Standard shipping mode had the highest volume but contributed to delayed deliveries compared to same-day options.

Results

The analysis highlighted areas where strategic discounting, better inventory management, and customer targeting could improve profitability.
It also demonstrated how blending Python’s EDA capabilities with Power BI dashboards creates a powerful end-to-end analysis workflow.


Tech Stack

  • Python: Pandas, NumPy, Matplotlib, Seaborn
  • Power BI: Interactive dashboards and business reporting
  • Jupyter Notebook: Data exploration and statistical analysis

📂 Repository Structure

├── data/ # train ├── notebooks/ # Jupyter notebooks for cleaning & EDA ├── reports/ # PDF/PNG exports of visualizations ├── dashboards/ # Power BI dashboard file (.pbix) └── README.md # Project documentation


📈 Dashboard Preview

image

Next Steps

  • Extend forecasting models using time series analysis
  • Integrate machine learning to predict customer churn or product demand
  • Automate reporting pipelines with Python scripts

Contribution

Contributions are welcome. Feel free to fork the repo, raise issues, or submit pull requests.


📬 Contact

Stephen Karanja

About

Data-driven analysis of the Superstore sales dataset using Python and Power BI — uncovering insights on profitability, customer behavior, and regional performance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published