Skip to content

This is the repo for the term poject of my 2025 NUS Applied Machine Learning class.

Notifications You must be signed in to change notification settings

McKeev/ML_project

Repository files navigation

ML-Powered Volatility-Targeted Investment Fund

Python Poetry

A machine learning-driven investment fund implementing adaptive volatility targeting through regime detection and predictive modeling. This project combines Hidden Markov Models (HMM) and Long Short-Term Memory (LSTM) networks to dynamically adjust portfolio volatility targets based on predicted market regimes.

Course: BMF5360 - Applied Machine Learning in Investments
Authors: Christian Masek, Cedric McKeever, Pratardan Agarwal, Parinistha Narula
Institution: National University of Singapore

Note to prof

You can find the important notebooks under

  • General data cleaning: data/notebooks/
  • Ridge Regression Volatility component: src/vol/dev_vol_predictor.ipynb
  • LSTM Notebook: src/lstm_model.ipynb
  • HMM Notebook: src/HMM/notebook.ipynb

📁 Repository Structure

ML_project/
├── src/                          # Source code
│   ├── HMM/                      # Hidden Markov Model implementation
│   │   ├── model.py              # HMM regime detection model
│   │   ├── features.py           # Feature engineering for HMM
│   │   └── README.md             # Detailed HMM documentation
│   ├── LSTM/                     # LSTM volatility prediction
│   │   ├── main.py               # LSTM model implementation
│   │   └── README.md             # LSTM feature documentation
│   ├── vol/                      # Volatility targeting strategies
│   │   └── targetting.py         # Leverage and volatility targeting
│   ├── utils/                    # Utility functions
│   │   ├── analytics.py          # Performance analytics
│   │   ├── logger.py             # Logging configuration
│   │   └── utils.py              # Common utilities
│   └── generate.py               # Main script to generate results
├── data/                         # Data directory
│   ├── raw/                      # Raw data files
│   ├── cleaned/                  # Processed data
│   └── notebooks/                # Exploratory data analysis notebooks
├── materials/                    # Project deliverables
│   ├── final/                    # Final report (LaTeX)
│   └── midterm_update/           # Midterm presentation materials
├── pyproject.toml                # Poetry dependency management
└── TASK.md                       # Project guidelines and requirements

🚀 Getting Started

Prerequisites

  • Python 3.12 or higher
  • Poetry for dependency management

Installation

  1. Clone the repository:

    git clone https://github.com/McKeev/ML_project_submission.git
    cd ML_project
  2. Install dependencies using Poetry:

    poetry install
  3. Activate the virtual environment:

    poetry env activate

Quick Start

Generate all tables and figures for the final report:

poetry run generate

This command will:

  • Run the HMM regime detection model
  • Execute LSTM volatility predictions
  • Perform backtests on volatility-targeted strategies
  • Generate LaTeX tables and figures in materials/final/

📊 Data Sources

The project uses a comprehensive dataset spanning 2003-2025, sourced from:

  • Bloomberg Terminal: VIX, Implied Correlations (3m)
  • Refinitiv: SPY historical prices and returns
  • Federal Reserve Economic Data (FRED):
    • Treasury yields (2Y, 5Y, 10Y)
    • Overnight rates (SOFR)
    • ICE BofA High Yield Spreads
    • Economic Market Volatility Index (EMVMACROBUS)
  • Yahoo Finance: Additional market data
  • Proprietary calculations: GARCH volatility, Parkinson's volatility, technical indicators

All data is preprocessed to weekly frequency (Friday closing) and stored in data/cleaned/.


🧠 Methodology

1. Hidden Markov Model (HMM) Regime Detection

The HMM identifies latent market regimes based on:

  • Yield curve slope (2Y-10Y Treasury spread)
  • Lagged returns
  • Realized volatility
  • Distance from moving averages

Key Implementation:

from src.HMM.model import HMMRegimePredictor, rolling_window_predict
from src.HMM.features import features_df

# Load features
features = features_df(['SLOPE_2Y_10Y', 'LRET', 'RealVol', 'DIST_MA3m'])

# Initialize and fit model
model = HMMRegimePredictor(n_regimes=3)
model.prepare_data(features).fit()

# Generate volatility targets
vol_results = model.get_vol_targets(base_vol_target=10)

2. Volatility Targeting

Dynamic leverage adjustment to maintain target volatility levels:

from src.vol import backtest_target

# Backtest with adaptive volatility targets
results = backtest_target(vol_target_series)
print(results.performance_table())

3. Walk-Forward Validation

All models use strict walk-forward analysis:

  • Training Window: Rolling historical data
  • Test Window: 2020-01-01 to 2025-08-31
  • No Look-Ahead Bias: Predictions use only past information

📈 Performance Metrics

The project evaluates strategies using:

  • Sharpe Ratio: Risk-adjusted returns
  • Maximum Drawdown: Peak-to-trough decline
  • Volatility: Realized annualized volatility

Results are automatically generated in LaTeX format for academic reporting.


🔧 Key Dependencies

Core libraries (managed via Poetry):

pandas = ">=2.3.3"           # Data manipulation
numpy = ">=2.3.3"            # Numerical computing
scikit-learn = ">=1.7.2"     # Machine learning utilities
hmmlearn = ">=0.3.3"         # Hidden Markov Models
matplotlib = ">=3.10.7"      # Visualization
yfinance = ">=0.2.66"        # Market data
arch = ">=8.0.0"             # GARCH models
polars = ">=1.34.0"          # High-performance data frames

📝 LaTeX Report Generation

The project includes automated LaTeX report generation:

  1. Navigate to the materials directory:

    cd materials/final
  2. Build the PDF report:

    make

The report includes:

  • Comprehensive methodology description
  • Auto-generated performance tables
  • Publication-quality figures
  • Complete bibliography

Format Requirements:

  • 8-12 pages (excluding cover, TOC, references)
  • Times New Roman, 11pt, 1.5 line spacing
  • 1-inch margins

📚 Documentation

Detailed documentation is available for each component:

  • HMM Module: src/HMM/README.md - Complete HMM API and examples
  • LSTM Module: src/LSTM/README.md - Feature descriptions and model architecture
  • Data Notes: data/data_notes.md - Data sources and preprocessing
  • Final Report: materials/final/README.md - LaTeX compilation instructions

Note: This README provides a high-level overview. For detailed technical documentation, please refer to the individual module README files and code comments.

About

This is the repo for the term poject of my 2025 NUS Applied Machine Learning class.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •