This project analyzes the Superstore Sales Dataset to uncover business insights around sales, profit, and customer behavior.
The goal is to demonstrate how data analysis and visualization can guide business growth, efficiency, and decision-making.
- Dataset: Superstore Sales (public dataset)
- Tools: Python (Pandas, NumPy, Matplotlib, Seaborn), Power BI
- Focus Areas:
- Sales and profit trends across categories and regions
- Customer segmentation and order patterns
- Identifying high-margin vs. low-margin products
- Impact of shipping modes on delivery efficiency
- Forecasting sales patterns
-
Data Cleaning
- Handled missing values and duplicates
- Standardized date formats and categorical fields
-
Exploratory Data Analysis (EDA)
- Sales and profit distributions
- Correlation between discounting and profitability
- Regional and segment performance
-
Visualization & Reporting
- Power BI dashboards for interactive filtering and drill-downs
- Python visualizations for deeper statistical analysis
- Discounting increased sales volume but significantly reduced profit margins.
- Technology segment generated the highest profitability, while Furniture lagged.
- West region showed the strongest overall performance, with consistent growth trends.
- Standard shipping mode had the highest volume but contributed to delayed deliveries compared to same-day options.
The analysis highlighted areas where strategic discounting, better inventory management, and customer targeting could improve profitability.
It also demonstrated how blending Python’s EDA capabilities with Power BI dashboards creates a powerful end-to-end analysis workflow.
- Python: Pandas, NumPy, Matplotlib, Seaborn
- Power BI: Interactive dashboards and business reporting
- Jupyter Notebook: Data exploration and statistical analysis
├── data/ # train ├── notebooks/ # Jupyter notebooks for cleaning & EDA ├── reports/ # PDF/PNG exports of visualizations ├── dashboards/ # Power BI dashboard file (.pbix) └── README.md # Project documentation
- Extend forecasting models using time series analysis
- Integrate machine learning to predict customer churn or product demand
- Automate reporting pipelines with Python scripts
Contributions are welcome. Feel free to fork the repo, raise issues, or submit pull requests.
Stephen Karanja
- Email: muhurakaranja7@gmail.com