A machine learning-driven investment fund implementing adaptive volatility targeting through regime detection and predictive modeling. This project combines Hidden Markov Models (HMM) and Long Short-Term Memory (LSTM) networks to dynamically adjust portfolio volatility targets based on predicted market regimes.
Course: BMF5360 - Applied Machine Learning in Investments
Authors: Christian Masek, Cedric McKeever, Pratardan Agarwal, Parinistha Narula
Institution: National University of Singapore
You can find the important notebooks under
- General data cleaning:
data/notebooks/ - Ridge Regression Volatility component:
src/vol/dev_vol_predictor.ipynb - LSTM Notebook:
src/lstm_model.ipynb - HMM Notebook:
src/HMM/notebook.ipynb
ML_project/
├── src/ # Source code
│ ├── HMM/ # Hidden Markov Model implementation
│ │ ├── model.py # HMM regime detection model
│ │ ├── features.py # Feature engineering for HMM
│ │ └── README.md # Detailed HMM documentation
│ ├── LSTM/ # LSTM volatility prediction
│ │ ├── main.py # LSTM model implementation
│ │ └── README.md # LSTM feature documentation
│ ├── vol/ # Volatility targeting strategies
│ │ └── targetting.py # Leverage and volatility targeting
│ ├── utils/ # Utility functions
│ │ ├── analytics.py # Performance analytics
│ │ ├── logger.py # Logging configuration
│ │ └── utils.py # Common utilities
│ └── generate.py # Main script to generate results
├── data/ # Data directory
│ ├── raw/ # Raw data files
│ ├── cleaned/ # Processed data
│ └── notebooks/ # Exploratory data analysis notebooks
├── materials/ # Project deliverables
│ ├── final/ # Final report (LaTeX)
│ └── midterm_update/ # Midterm presentation materials
├── pyproject.toml # Poetry dependency management
└── TASK.md # Project guidelines and requirements
- Python 3.12 or higher
- Poetry for dependency management
-
Clone the repository:
git clone https://github.com/McKeev/ML_project_submission.git cd ML_project -
Install dependencies using Poetry:
poetry install
-
Activate the virtual environment:
poetry env activate
Generate all tables and figures for the final report:
poetry run generateThis command will:
- Run the HMM regime detection model
- Execute LSTM volatility predictions
- Perform backtests on volatility-targeted strategies
- Generate LaTeX tables and figures in
materials/final/
The project uses a comprehensive dataset spanning 2003-2025, sourced from:
- Bloomberg Terminal: VIX, Implied Correlations (3m)
- Refinitiv: SPY historical prices and returns
- Federal Reserve Economic Data (FRED):
- Treasury yields (2Y, 5Y, 10Y)
- Overnight rates (SOFR)
- ICE BofA High Yield Spreads
- Economic Market Volatility Index (EMVMACROBUS)
- Yahoo Finance: Additional market data
- Proprietary calculations: GARCH volatility, Parkinson's volatility, technical indicators
All data is preprocessed to weekly frequency (Friday closing) and stored in data/cleaned/.
1. Hidden Markov Model (HMM) Regime Detection
The HMM identifies latent market regimes based on:
- Yield curve slope (2Y-10Y Treasury spread)
- Lagged returns
- Realized volatility
- Distance from moving averages
Key Implementation:
from src.HMM.model import HMMRegimePredictor, rolling_window_predict
from src.HMM.features import features_df
# Load features
features = features_df(['SLOPE_2Y_10Y', 'LRET', 'RealVol', 'DIST_MA3m'])
# Initialize and fit model
model = HMMRegimePredictor(n_regimes=3)
model.prepare_data(features).fit()
# Generate volatility targets
vol_results = model.get_vol_targets(base_vol_target=10)Dynamic leverage adjustment to maintain target volatility levels:
from src.vol import backtest_target
# Backtest with adaptive volatility targets
results = backtest_target(vol_target_series)
print(results.performance_table())All models use strict walk-forward analysis:
- Training Window: Rolling historical data
- Test Window: 2020-01-01 to 2025-08-31
- No Look-Ahead Bias: Predictions use only past information
The project evaluates strategies using:
- Sharpe Ratio: Risk-adjusted returns
- Maximum Drawdown: Peak-to-trough decline
- Volatility: Realized annualized volatility
Results are automatically generated in LaTeX format for academic reporting.
Core libraries (managed via Poetry):
pandas = ">=2.3.3" # Data manipulation
numpy = ">=2.3.3" # Numerical computing
scikit-learn = ">=1.7.2" # Machine learning utilities
hmmlearn = ">=0.3.3" # Hidden Markov Models
matplotlib = ">=3.10.7" # Visualization
yfinance = ">=0.2.66" # Market data
arch = ">=8.0.0" # GARCH models
polars = ">=1.34.0" # High-performance data framesThe project includes automated LaTeX report generation:
-
Navigate to the materials directory:
cd materials/final -
Build the PDF report:
make
The report includes:
- Comprehensive methodology description
- Auto-generated performance tables
- Publication-quality figures
- Complete bibliography
Format Requirements:
- 8-12 pages (excluding cover, TOC, references)
- Times New Roman, 11pt, 1.5 line spacing
- 1-inch margins
Detailed documentation is available for each component:
- HMM Module:
src/HMM/README.md- Complete HMM API and examples - LSTM Module:
src/LSTM/README.md- Feature descriptions and model architecture - Data Notes:
data/data_notes.md- Data sources and preprocessing - Final Report:
materials/final/README.md- LaTeX compilation instructions
Note: This README provides a high-level overview. For detailed technical documentation, please refer to the individual module README files and code comments.