This repository contains a Sales Data Analysis project that explores and answers business-related questions using Python, Pandas, and Matplotlib. The dataset consists of 12 months of sales data from an electronics store, containing information on order ID, products, quantity ordered, price, order data and purchase address.
The project includes data cleaning, exploratory data analysis (EDA), and visualization of insights to make data-driven business decisions.
Before diving into analysis, data cleaning was performed to ensure accuracy and consistency. Tasks included:
β Dropping NaN values from the DataFrame.
β Converting data types
β Extracting useful columns (e.g., hour from timestamp, city from address).
Using Pandas & Matplotlib, the following key business questions were explored:
1οΈβ£ What was the best month for sales? π How much revenue was generated that month?
2οΈβ£ Which city had the highest sales? π Understanding regional demand.
3οΈβ£ What time should advertisements be displayed? β° Maximizing customer purchase likelihood.
4οΈβ£ Which products are most often sold together? π Product bundling insights.
5οΈβ£ What product sold the most? π¦ Why might it have been the top seller?
Each question was answered using data aggregation, groupby operations, and visualizations
Throughout this analysis, the following Pandas & Matplotlib techniques were leveraged:
β Merging & Concatenating multiple CSV files to create a unified dataset (pd.concat).
β Adding new calculated columns (e.g., sales, hour,city etc).
β String parsing operations (.str.split(), .apply() functions).
β Using groupby for aggregate analysis.
β Visualizing insights using vertival and horizontal bar charts and line graphs
β Labeling and formatting graphs for better readability.
This project was inspired by real-world business problems and implemented using Python's powerful data analysis tools.