Skip to content
View Creative-Ataraxia's full-sized avatar
🌟
Senior Data Engineer
🌟
Senior Data Engineer

Block or report Creative-Ataraxia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Creative-Ataraxia/README.md

Hi there, happy to see you!


I'm Roy, a U.S. senior Data Engineer with 5+ years of experience. I architect, modernize, and operate batch & streaming data platforms on AWS-native tooling for clients in manufacturing and financial services domains.

I worked with:

  • Batch ETL (Spark, Redshift)
  • Real-time Streaming (Kafka, Flink, Aurora);
  • Lakehouse Patterns (S3, Hudi, Athena);
  • Data Modeling (PySpark, Bigquery, dbt);
  • Platform IaC (Airflow, Kubernetes, Terraform);
  • Governance (Lake Formation, KMS, IAM, CloudWatch);

My work combines independent ownership with measurable revenue impacts: cost savings, latency reductions, operational scaling, and tech debt reductions.

Outside of work, I enjoy building personal projects such as: Batch ETL, Stream Processing and LLM-related coding competitions, recently won Silver Medal in a featured Kaggle LLM competition.

Here are some of the architectures I've worked with:

Automated Data Marketplace Hot/Cold Realtime Streaming
Batch ETL for Retail Analytics Stream Processing - NASA's data

Currently open to Data Engineering, DataOps, MLOps roles.


Pinned Loading

  1. GA4-Analytical-Pipeline GA4-Analytical-Pipeline Public

    A fully containerised batch ETL stack that ingests ~5M Google Analytics 4 data, transforms it with Spark, orchestrates the workflow in Airflow, lands data facts/dimensions in Postgres for downstrea…

    Python 1

  2. eonet-realtime-streaming eonet-realtime-streaming Public

    Real-time streaming data engineering project; Ingests, transforms, persists, and visualize data about real-time natural events sourced from NASA's eonet APIs; Mapbox data visuals below:

    Python

  3. Atomized-Tasks-Dataset Atomized-Tasks-Dataset Public

    a tabular dataset of 6,970 real-world workflows; These workflows are commonly used in SaaS, e-commerce, advertising, marketing, sales, customer support, etc. Each row represents an atomic task: a m…

  4. Kaggle_Solutions Kaggle_Solutions Public

    My solutions for the featured kaggle competitions I participated in

    HTML 2 4

  5. Statistics_and_Probability_Concepts_Cheatsheet Statistics_and_Probability_Concepts_Cheatsheet Public

    Notes for all 4 courses in the MITx SDS program

    HTML 3