A sweeping data acquisition program designed to present current hotspots of activity among publicly traded companies, conglomerates, and ETF members.
-
Python
- Data Processing
- PyArrow (In-memory Format)
- PySpark (Data Processing)
- DeltaLake (Durable Format [Parquet])
- Hive (Query Layer)
- Airflow (Orchestration)
- NetworkX (Graphs)
- Data Processing
-
Scala
- Kafka (Real time Streaming)
-
Monitoring
- Prometheus -> Grafana
Kafka --> Spark (Arrow In-Memory) --> ML Model (Direct Predictions) ↘ Prometheus --> Grafana (Real-Time Monitoring)
Kafka --> Spark (Batch) --> Delta Lake (Parquet + ACID) ↘ Hive Metastore ↘ Batch ML Retraining
Kafka --> Prometheus --> Grafana (Kafka lag, Spark times, Model inference time) Spark --> Prometheus --> Grafana (Batch processing time, failure rate) Delta --> Airflow --> Optimize Delta, Retrain Model
- RL/DL/NLP
venv\Scripts\activate