Skip to content
View tarekmasryo's full-sized avatar

Block or report tarekmasryo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
tarekmasryo/README.md

Tarek Masryo Banner

Typing SVG

AI/ML Engineer building production ML services and reliable GenAI apps (RAG + Agents).
From raw data → validated pipelines → deployed APIs → decision-ready dashboards.

Kaggle Datasets Grandmaster Kaggle Notebooks Master

GitHub Website Repos LinkedIn

Kaggle Hugging Face Streamlit


🧭 What I do

Area What you can expect
Production ML delivery Data validation + leak-safe evaluation → calibrated models → threshold policies → inference-ready artifacts
GenAI (RAG & Agents) Grounded RAG, structured extraction/summarization, tool-calling agents with guardrails & evaluation
ML APIs & deployment Dockerized FastAPI services, strict request/response schemas, versioned artifacts, CI-friendly delivery
MLOps & monitoring MLflow tracking, monitoring signals (latency/errors/drift/cost), reproducibility and quality gates
Applied NLP & CV NLP: classification/extraction/semantic search • CV: classification/detection/segmentation

🌟 Featured

🧩 Dashboards & Apps

Project Focus Link
Fraud Detection Dashboard Streamlit app + ML artifacts + decision policies + cost/threshold analysis Repo
EV Charging Analytics Dashboard Global geo analytics: clustering, KPIs, filters, fast-DC allocation optimizer Repo
Health Intelligence Platform Decision-ready wellbeing analytics dashboard with actionable insights Repo
Football Matches 2024/2025 Dashboard Standings, team explorer, head-to-head, match table (Ag-Grid) Repo
Advanced ML Sentiment Lab (Dashboard) NLP decision views: ROC/PR, threshold tuning, error analysis Repo

🤖 GenAI (RAG & Agentic Workflows)

Project Focus Link
LLM System Ops — Production Telemetry LLMOps signals: quality, cost, latency, failure patterns + aligned SFT samples Repo
RAG QA Logs & Corpus Retrieval + answer evaluation with realistic telemetry and labels RepoDataset

📦 Data products (Kaggle)

Dataset What it’s for Link
YouTube Shorts & TikTok Trends 2025 Short-form trends analytics and virality exploration Dataset
Cancer Risk Factors Clean features for health EDA and risk modeling Dataset
Football Matches 2024/2025 (Top Leagues + UCL) Standardized match-level data for analytics/modeling Dataset
Digital Lifestyle & Mental Wellness Behavioral signals for wellbeing analytics and prediction Dataset

🧰 Systems & Pipelines

Project Focus Link
Credit Card Fraud Detection — A Pipeline Journey End-to-end ML workflow + calibration + threshold policies + exported artifacts Repo
Pima Diabetes Pipeline Train/evaluate/infer structure with validation + reproducible runs Repo
Text Sentiment Analysis Strong baselines + calibration + threshold tuning + explainability Repo

🛠️ Tech stack

Category Tools
Languages & Core Python SQL Bash Git Linux
Data & Analytics NumPy Pandas Polars DuckDB Jupyter
ML / DL scikit-learn XGBoost LightGBM PyTorch TensorFlow
NLP / CV / LLM Hugging Face Transformers OpenCV
Visualization & Apps Matplotlib Seaborn Plotly Streamlit Gradio
APIs & Deployment FastAPI Pydantic Docker Postgres
GenAI / RAG Stack LangChain LlamaIndex FAISS pgvector
Observability & Monitoring OpenTelemetry Prometheus Grafana Sentry
MLOps & Quality MLflow GitHub Actions pytest Ruff Pandera

🤝 Collaboration

  • 🚀 Build & ship ML/GenAI products: FastAPI + Docker, clean contracts, production-ready delivery
  • 🧠 RAG/LLM reliability: retrieval evaluation, grounded answers, guardrails & regression suites
  • 🛠️ MLOps: MLflow tracking, CI quality gates, monitoring signals (latency/errors/drift/cost)

Best contact: LinkedIn

If you find the work useful, a ⭐ helps more people discover it.

Footer Banner

Pinned Loading

  1. tarekmasryo.github.io tarekmasryo.github.io Public

    Tarek Masryo — AI/ML Engineer Portfolio

    JavaScript 2

  2. fraud-detection-dashboard fraud-detection-dashboard Public

    Production-minded Streamlit + Plotly fraud detection dashboard with decision policies (Strict/Balanced/Lenient), cost-vs-threshold analysis, and calibrated model artifacts.

    Python 5

  3. rag-qa-logs-corpus-data rag-qa-logs-corpus-data Public

    Synthetic multi-table RAG QA telemetry benchmark (corpus→chunks→retrieval→eval): labels for correctness/faithfulness/hallucination + cost/latency for RAG evaluation and dashboards.

    Python 2

  4. llm-system-ops-production-telemetry-sft-data llm-system-ops-production-telemetry-sft-data Public

    Production-grade synthetic dataset for LLMOps: interaction-level telemetry (latency/cost/tokens), failure RCA, tool-use analytics, user feedback, plus 1:1 aligned SFT samples.

    Python 1

  5. ev-charging-dashboard ev-charging-dashboard Public

    Streamlit EV Charging Analytics Dashboard (2025): global map clustering + KPIs + country/power-class filters + fast-DC allocation optimizer.

    Python 4

  6. pima-diabetes-pipeline pima-diabetes-pipeline Public

    End-to-end diabetes risk prediction pipeline (Pima): EDA → feature engineering → calibration + cost-aware threshold → deployable artifacts.

    Jupyter Notebook 9