Smartphone Price Analysis and Prediction

This repository contains a comprehensive data science project focused on analyzing smartphone data scraped from Flipkart, one of India's leading e-commerce platforms. The project leverages web scraping, data cleaning, exploratory data analysis (EDA), machine learning, and interactive visualization to uncover market trends and predict smartphone prices. By combining Python's powerful libraries with a user-friendly dashboard, this project offers valuable insights for consumers, retailers, and tech enthusiasts.

Project Overview

The Smartphone Price Analysis and Prediction project aims to understand pricing dynamics and consumer preferences in the Indian smartphone market. Using web scraping, the project collects detailed product information from Flipkart, including prices, specifications, ratings, and reviews. Through rigorous data cleaning and EDA, it identifies key trends, such as popular hardware configurations and brand dominance. Machine learning models, including Linear Regression and Random Forest, are employed to predict prices based on features like RAM, storage, and camera resolution. An interactive dashboard built with Dash provides a visual interface to explore these insights, making the project accessible to a wide audience.

Objectives:

Scrape and compile a robust dataset of smartphone listings.
Analyze price distributions and feature correlations to uncover market trends.
Build accurate price prediction models using machine learning.
Visualize findings through static plots and an interactive dashboard.
Provide actionable insights for stakeholders in the smartphone ecosystem.

Features

Web Scraping: Automated extraction of smartphone data from Flipkart using requests and BeautifulSoup, with user agent rotation to avoid detection.
Data Cleaning: Standardized dataset with regex-based feature extraction (e.g., RAM, storage) and imputation of missing values.
Exploratory Data Analysis:
- Visualizations of price, rating, RAM, storage, and brand distributions.
- Correlation analysis to identify price drivers.
- Word cloud for qualitative insights from product descriptions.
Predictive Modeling:
- Linear Regression and Random Forest models for price prediction.
- Hyperparameter tuning with GridSearchCV for optimized performance.
- Feature importance analysis to highlight key predictors.
Interactive Dashboard: A Dash-based web app for exploring price trends and battery-price relationships by brand.
Reproducible Workflow: Modular scripts with clear documentation for easy replication.

Dataset

The dataset is sourced via web scraping from Flipkart's smartphone search results (https://www.flipkart.com/search?q=phones). It includes:

Columns: Product name, description, price, rating, reviews, RAM (GB), storage (GB), display size (inches), camera (MP), battery (mAh), warranty (years), and brand.
Size: Approximately 1,000 unique smartphone listings across 59 pages.
Files:
- flipkart_phones1.csv: Raw scraped data.
- checkclean_final.xlsx: Cleaned and processed dataset.
Note: Due to ethical considerations, raw data is not shared publicly. Users can regenerate the dataset using the provided scraping script.

Installation

To run this project locally, ensure you have Python 3.11+ installed. Follow these steps:

Clone the Repository:

git clone https://github.com/Jagan515/FlipkartAnalysis
cd FlipkartAnalysis

Create a Virtual Environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```

Requirements File: The requirements.txt includes:

requests==2.31.0
beautifulsoup4==4.12.2
pandas==2.2.2
numpy==1.26.4
seaborn==0.13.2
matplotlib==3.8.4
scikit-learn==1.5.1
wordcloud==1.9.3
dash==2.17.1
plotly==5.22.0

Usage

Web Scraping:
- Run WebScrapping.py to collect smartphone data from Flipkart:
```
python WebScrapping.py
```
- Output: flipkart_phones1.csv.
- Caution: Adjust the sleep time (time.sleep(45)) to avoid server overload and comply with Flipkart’s terms of service.
Data Cleaning and EDA:
- Execute Final_python.py to clean the data, perform EDA, and generate visualizations:
```
python Final_python.py
```
- Output: Cleaned dataset (checkclean_final.xlsx) and plots (saved locally if configured).
Predictive Modeling:
- The Final_python.py script trains Linear Regression and Random Forest models, with results printed to the console (e.g., MSE, R² scores).
Interactive Dashboard:
- Launch the Dash app to explore data visually:
```
python Final_python.py
```
- Access the dashboard at http://127.0.0.1:8050 in your browser.
- Use the brand dropdown to view price distributions and battery-price scatter plots.
Sample Output:
- Visualizations: Price histograms, RAM/storage counts, correlation heatmaps, word clouds.
- Model Performance: Random Forest R² ≈ 0.85 after tuning.
- Dashboard: Interactive plots for brand-specific analysis.

Project Structure

smartphone-price-analysis/
│
├── WebScrapping.py          # Web scraping script
├── Final_python.py         # Data cleaning, EDA, modeling, and dashboard
├── flipkart_phones.csv    # Raw scraped data (sample, not included)
├── checkclean_final.xlsx   # Cleaned dataset (sample, not included)
├── requirements.txt        # Python dependencies
├── plots/                  # Folder for saved visualizations (optional)
└── README.md               # Project documentation

Results

EDA Insights:
- Most smartphones are priced between ₹10,000–₹30,000, with a peak at ₹15,000–₹20,000.
- 8GB RAM and 128GB storage dominate, reflecting mid-range preferences.
- Ratings cluster at 4.0–4.5 for budget phones, indicating high satisfaction.
- RAM, storage, and camera resolution are key price drivers (correlation > 0.5).
- Xiaomi, Realme, and Vivo lead the budget segment with 20–30 models each.
Modeling:
- Linear Regression: R² ≈ 0.6, MSE ≈ 1e8.
- Random Forest (tuned): R² ≈ 0.85, MSE ≈ 5e7, with RAM and storage as top predictors.
Dashboard: Enables brand-specific exploration of price trends and feature relationships.

Future Scope

Multi-Platform Scraping: Include Amazon and Snapdeal for a broader market view.
Sentiment Analysis: Analyze reviews to gauge customer sentiment.
Advanced Models: Test XGBoost or neural networks for better predictions.
Enhanced Dashboard: Add filters for RAM, price, or 5G support.
Recommendation System: Suggest phones based on user preferences and budget.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a feature branch (git checkout -b feature/YourFeature).
Commit changes (git commit -m 'Add YourFeature').
Push to the branch (git push origin feature/YourFeature).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, reach out via:

GitHub Issues: Create an issue
Email: jaganp515@gmail.com

Happy analyzing, and enjoy exploring the smartphone market!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
Final_python.py		Final_python.py
README.md		README.md
StepByStepTest.ipynb		StepByStepTest.ipynb
WebScrapping.py		WebScrapping.py
flipkart_phones.csv		flipkart_phones.csv
flipkart_phones_cleaned.csv		flipkart_phones_cleaned.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smartphone Price Analysis and Prediction

Table of Contents

Project Overview

Features

Dataset

Installation

Usage

Project Structure

Results

Future Scope

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

Jagan515/FlipkartAnalysis

Folders and files

Latest commit

History

Repository files navigation

Smartphone Price Analysis and Prediction

Table of Contents

Project Overview

Features

Dataset

Installation

Usage

Project Structure

Results

Future Scope

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages