A professional machine learning pipeline for extracting trend insights and forecasting price movements from OHLCV market data.
- Multi-Symbol Data Ingestion: Robust data fetching from MetaTrader 5 (MT5) and CCXT-supported exchanges (MEXC, etc.).
- Statistical Feature Extraction: Automatic calculation of 20+ features including Log Returns, RSI, SMA, Volatility, and Linear Regression Channels.
- Time Series Forecasting: Unified training pipeline using
mlforecastand XGBoost, with multi-symbol support. - Model Registry: Full integration with MLflow for experiment tracking and model versioning.
- Automation: Built-in scheduling and DVC-powered pipeline management.
- Dependencies: Managed by
uv.uv sync
- Configuration: Set environment variables and Edit
params.yamlto configure data sources, features, and training parameters.
# For Gemini
export GEMINI_API_KEY="your-key-here"
# For OpenAI
export OPENAI_API_KEY="your-key-here"- Run Pipeline:
uv run dvc repro
The pipeline behavior is controlled by params.yaml. Key sections include:
mexc/mt5:symbols: List of assets to trade/analyze (e.g.,["BTCUSD", "ETHUSD"]).timeframes: List of timeframes to fetch (e.g.,["H1", "H4"]).
target_timeframe: The base timeframe for the final dataset (e.g.,"H1").target_symbol: (Optional) Filter final dataset to a specific symbol (e.g.,"ETHUSD"). Useful for specialized models.
Enrich the dataset with features from other contexts (preventing lookahead bias):
enable_multivariate:true/false. Enable merging of context features.multivariate_config: Defines the single target and its contexts.target_timeframe: Base timeframe (e.g., "H1").target_symbol: Base symbol (e.g., "ETHUSD").context: Dictionary defining contexts to merge (Symbol: [Timeframes]).- Example:
multivariate_config: target_timeframe: "H1" target_symbol: "ETHUSD" context: BTCUSD: ["H1", "H4"] # Merge BTC H1 and H4 context ETHUSD: ["H4"] # Merge ETH H4 context (self-context)
- The pipeline automatically handles suffixing (e.g.,
_BTCUSD,_H4,_BTCUSD_H4) and lookahead prevention.
model_name: Name of the model registered in MLflow.use_static_features:true/false.- Global Model: Set to
truewhen training on multiple symbols (Batch Mode) to let the model distinguish between assets. - Multivariate Mode: Automatically disabled (ignored) when
enable_multivariateistrue, as the model trains on a single target series.
- Global Model: Set to
Use the built-in scheduler to run the pipeline at regular intervals defined in params.yaml.
uv run python -m ml_orderflow.schedulerConfigure interval_minutes in the schedule section of params.yaml.
To ensure the pipeline runs even after system restarts or without a manual terminal session:
- Open Task Scheduler on Windows.
- Click Create Basic Task.
- Trigger: Choose "Daily" or "When I log on".
- Action: "Start a Program".
- Program/script:
uv. - Add arguments:
run dvc repro(orrun python -m ml_orderflow.scheduler). - Start in:
ml-orderflow.
- Data: MT5, CCXT, Pandas
- Features: NumPy, Scipy
- ML: MLForecast, NeuralForecast, XGBoost, Scikit-Learn
- MLOps: DVC, MLflow
- API: FastAPI, Uvicorn