Skip to content

Portfolio of my data science and machine learning projects, including analyses, notebooks, and visualizations.

Notifications You must be signed in to change notification settings

saziaa/Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 

Repository files navigation

Portfolio

๐Ÿ‘‹ Hi, I'm Sazia

I'm a data analyst and researcher with strong experience in health informatics, pharmaceuticals, and machine learning, specializing in transforming healthcare data into actionable insights. Skilled in SQL, Python, R, Tableau, and predictive modeling, with expertise in healthcare datasets (CIHI, Statistics Canada, NPRI) and advanced reporting.


๐Ÿ›  Skills

  • Programming & Analysis: Python, R, SQL
  • Machine Learning & AI: Random Forest, XGBoost, SVM, Neural Networks (CNN, RNN, LSTM), Time Series (Prophet, SARIMA), NLP (Bag of Words, LDA, Word Embeddings)
  • Statistical Modeling: Linear/Logistic Regression, GLM, GAM, PCA, ANOVA
  • Visualization & Reporting: Tableau, Plotly, Matplotlib, Seaborn
  • Big Data Technologies: Hadoop, PySpark, Hive
  • Database Management: MySQL, SQLite, MongoDB
  • Domain Knowledge: Public health analytics, environmental analytics, pharmaceutical Q/A, Q/C

๐Ÿ“š Selected Projects

  1. Industrial Air Pollution & Lung Cancer in Canada (2002โ€“2023)

    • Integrated NPRI industrial emissions with Statistics Canada lung cancer incidence data.
    • Built predictive models (XGBoost, Random Forest, Prophet) and applied DLNM-GAM to project province-wise cancer cases.
    • Applied Apriori algorithm with permutation testing for pollutant co-occurrence analysis.
    • Developed an interactive Tableau dashboard for data visualization.
  2. Deep Learning Insights into Social Determinants of Chronic Disease and Longevity (Collaborative Project)

    • Modeled global chronic disease mortality trends using WHO and World Bank data.
    • Developed XGBoost, GRU, LSTM models with 3-fold cross-validation.
    • Used SHAP for interpretability of socioeconomic predictors like income inequality and diabetes prevalence.
  3. Lightweight Deep Learning for Alzheimerโ€™s Stage Classification

    • Fine-tuned VGG16 on MRI scans to classify Alzheimerโ€™s stages.
    • Enhanced accuracy through augmentation, class balancing, and ablation studies
    • Applied Grad-CAM for interpretability and improved accuracy with augmentation and class balancing.
  4. Energy Consumption Forecasting at East Melbourne Wastewater Plant

    • Analyzed historical plant data and environmental parameters.
    • Compared traditional, ensemble, and deep learning models to optimize energy demand forecasting.
  5. Sales Prediction for Pharmaceutical Distribution Companies

    • Conducted exploratory data analysis on pharmaceutical sales datasets and prepared data for modeling
    • Developed time series forecasting models (SARIMA, Prophet, SVR) in Python and compared performance using RMSE
    • Predicted short-term sales with high accuracy; Prophet model outperformed SARIMA and SVR, demonstrating strong Python-based data processing skills

๐Ÿ“ซ Connect


About

Portfolio of my data science and machine learning projects, including analyses, notebooks, and visualizations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published