Quera Data Science Bootcamp / Winter 2024 / Team G5
This project involves scraping, cleaning, and analyzing mobile store data from the GSM Arena. Utilizing multithreading for efficiency, the data was then structured into MySQL tables and imported using SQLAlchemy. Further, the project extends into implementing machine learning techniques, including classification, regression, and clustering, to derive actionable insights from mobile feature data. Outputs from these analyses were visualized using PowerBI to aid in decision-making processes.
The mobile industry is rapidly evolving, necessitating up-to-date analysis to stay competitive. Our objective is to:
- Accurately Predict Consumer Preferences: Use machine learning to predict trends and consumer preferences in the mobile market.
- Optimize Inventory and Pricing Strategy: By understanding popular features and their impact on pricing, we aim to guide stores in inventory management.
- Enhance User Experience: Apply clustering to segment users and tailor marketing strategies effectively.
- Cleaning
├── Data Cleaning.ipynb [Cleaning scraped mobile data]
├── Cleaned_df.csv [Output of this step]
- Scrape
├── crawl_links.py [Script for crawling links]
├── scrape links.py [Script for storing all links]
├── scrape features multithread.py [Script for crawling links with multithreading]
├── AllLinks.csv [Output of 'scrape links.py']
├── Scraped_DataSet_MultiThread.csv [Output of 'scrape features multithread.py']
- DataBase
├── db_gsmarena.py [Script for setting up the database]
├── Data Base Structure.png [Shows structure of database]
├── Output_Tables.rar [Tables for creating database - it is the output of Cleaning/Data Cleaning.ipynb]
- Statistics
├── Descriptive statistics.ipynb [Notebook for descriptive statistical analysis]
├── Descriptive statistics.zip [Output of descriptive statistical analysis]
├── Hypo Test.ipynb [Notebook for Hypothesis statistical analysis]
- Machine Learning
├── Market_Q1_Clustering_KMeans_DBScan.ipynb [Notebook for clustering analysis]
├── Market_dataset_Q2_Q3_Classification_Regression.ipynb [Notebook for classification and regression analysis]
- Powerbi [PowerBI dashboard directory for ML part]
├── ML_Powerbi.pbix [PowerBI dashboard file for ML part]
├── clf_result.csv [Input for PowerBI dashboard file]
├── reg_result.csv [Input for PowerBI dashboard file]
├── DataSet for powerbi.csv [Input for PowerBI dashboard file]
- PowerBI
├── Reports.pbix [PowerBI dashboard file]
- requirements.txt [Python dependencies for the project]
- Clone the Repository:
git clone git@github.com:sinaaasghari/Mobile-Store.git- Install Dependencies:
pip install -r requirements.txt-
Data Scraping:
- Execute
Scrape/scrape links.pyto scrape links of mobiles in the GSM website. - Utilize
Scrape/crawl_links.pyfor getting data of each link and doing any modifications needed during scraping. - Utilize
Scarpe/scrape features multithread.pyfor doing the previous step in multithreading way. (optional)
- Execute
-
Database Setup and Import:
- Initialize your MySQL database system.
- Use
Cleaning/Data Cleaning.ipynbto set up tables from scraped data. - Use
DataBase/db_gsmarena.pyto import tables.
-
Perform Analysis:
- Launch
Statistics/Descriptive statistics.ipynbandStatistics/Hypo Test.ipynbto run statistical analysis. - Launch
Machine Learning/Market_dataset_Q2_Q3_Classification_RegressionandMachine Learning/Market_Q1_Clustering_KMeans_DBScan.ipynbto run Machine Learning analysis.
- Launch
-
Visualize Results:
- Open and explore the
PowerBI/Reports.pbixto interact with visual data representations.
- Open and explore the
We welcome any feedback, bug reports, and suggestions. Please let us know if you encounter any issues or have ideas for improvement.