Group 62 Data BI Repository: Data Analysis Project with Microsoft Fabric (Azure Data Stack)
Overview 📝 This project aims to develop a comprehensive framework addressing key aspects of data engineering, data analysis, visualization, and machine learning model development within the Steam Community, using Microsoft Fabric (Azure Data Stack). The framework is designed to optimize the collection, processing, and analysis of data related to user behavior, gaming trends, and interactions within the Steam community. Additionally, it includes a comparative analysis of gaming data across other popular platforms such as Nintendo and PlayStation. The insights derived from this analysis will provide valuable support for strategic decision-making, including new game development, marketing strategy optimization, and enhancing user experience on the platform.
| Folder/File | Description |
|---|---|
| /data | Folder that stores datasets and files used by the Analysis, Dashboard and ML models. |
| /Notebooks | Folder containing Jupyter notebooks used for ETL, EDA and feature engineering processes |
| /Images | Folder containing relevant and illustrative images for the analysis project. |
| requirements.txt | File listing dependencies and libraries required to run the project. |
| gitignore | File specifying folders and files to be ignored by version control (git). |
| LICENSE | MIT LICENSE - File specifying the terms under which the source code is shared. |
| functions.py | Python file with functions to deploy in the main file 'app-py' |
| app.py | Main Python file serving as an entry point for the application, defining Model configuration and execution |
| README.md | Main project documentation in English. |
| README_ESP.md | Main project documentation in Spanish. |
| Name | Rol | ||
|---|---|---|---|
| Leonardo Cortés | Project Manager (PM), Data Engineer, Data Analyst | leocortes85 | Leonardo Cortés Zambrano |
| Beverly Gonzalez | ML Engineer and Data Scientist | licette32 | Beberly Gonzalez |
-
Technology Stack:
- Utilized Microsoft Fabric, which encompasses the full Azure Data Stack, to develop a complete end-to-end data solution.
-
Data Architecture:
-
Implemented a Medallion Architecture to optimize data access and maintain a continuous workflow, ensuring the data remains accessible, manageable, and ready for downstream processes.
-
-
Data Transformations:
- Performed Extract, Transform, and Load (ETL) operations using the Pandas library, automating data loading from client-provided folders.
- Applied strategies to handle nested data structures and eliminated irrelevant or highly null columns to optimize the data for further use.
- Conducted an incremental load of information, using external APIs, web scraping, and custom functions to complement the dataset.
-
Feature Engineering:
- Conducted extensive feature engineering to ensure the data was fully consumable, cleaned, and prepped for machine learning processes and data analysis.
-
Dimensional Structure and Semantic Model:
- Built a dimensional structure stored in a semantic model to enable insightful analysis.
- Developed a Power BI dashboard that provides visual analytics and insights into the video game market.
-
Recommendation Models:
- Developed recommendation models using machine learning techniques, specifically leveraging cosine similarity for user and item recommendations.
-
Model Testing and Deployment:
- Conducted tests of the machine learning models using Azure ML tools.
- Created a
functions.pyfile that stores all the functions to be executed during the deployment phase.
-
Streamlit Deployment: (You can deploy the live app HERE)
- Deployed the entire project via Streamlit through the
app.pyfile, allowing users to:- View the interactive dashboard.
- Interact with the machine learning models, including item and user recommendations, showcasing the project's full capabilities.
- Deployed the entire project via Streamlit through the




