This repository contains a Streamlit app also a Power BI project and an accompanying prediction model built from the Stack Overflow Developer Survey 2024 data. The Streamlit app (see the streamlit folder) provides interactive visualizations; the repo also includes the Power BI report file (ProjeST.pbix), an exported Power BI PDF (ProjeST.pdf), a project report PDF, and a Jupyter notebook (PredictionModel.ipynb) with the prediction model and experiments.
streamlit folder- Includes all files and app.py related to streamlit project.ProjeST.pbix— Power BI report (Power BI Desktop file).ProjeST.pdf— Exported PDF of the Power BI report (embedded below).PredictionModel.ipynb— Jupyter notebook with the prediction model and experiments.
- To view Streamlit Vizulation please view README.md that inside streamlit folder.
- PDF: contains pages of PowerBI.
- Power BI: open
ProjeST.pbixwith Power BI Desktop (Windows).
Figure: PowerBI dashboard overview page 1 of 4
To view all pages in PowerBI, please seek ProjeST.pdf.
The PredictionModel.ipynb notebook contains the data-preparation steps, model training, and evaluation used to predict outcomes from the Stack Overflow survey data. Open it with Jupyter or VS Code's notebook support.
The PredictionModel.ipynb notebook contains the model pipeline and experiments used to predict annual developer compensation from the Stack Overflow survey. Main sections:
- Lib Imports — Import standard data science libraries and configure warnings.
- Data Fetch — (Optional) download dataset with
kagglehuband load the CSV. - Basic Data Clean — Selects relevant columns and filters salary outliers.
- Feature Preprocessing | Data Augmentation — Convert and simplify features (experience, education, dev type, country, employment) and keep top countries.
- Pipe Line — Build preprocessing pipelines (imputation, scaling, one-hot encoding) and a GradientBoostingRegressor; includes evaluation and plotting helpers plus a
mainrunner. - Model Run — Calls
main(df)to execute the full training/evaluation flow.
- You can easily run model on Google Colab.
