🥅 Soccer Prediction Project v2.3

A machine learning system that accurately predicts soccer match draws and goal patterns using ensemble methods and advanced feature engineering techniques, with a focus on high-precision results for betting applications.

📋 Table of Contents

Key Features
Project Architecture
Installation
Usage
Development Workflow
Model Pipeline
Configuration
Extending the Model
Troubleshooting
Contributing
License

✨ Key Features

Ensemble Model Architecture: Combines XGBoost, TabNet, LightGBM, Random Forest, and KNN for robust predictions.
Precision-Focused Weighting: Optimized weights based on each model's precision performance.
KNN Model Support: K-Nearest Neighbors can now be used as a base model in the ensemble.
Vectorized Threshold Optimization: Efficient threshold tuning for precision-recall balance.
CPU-Only Optimization: Explicitly configured for deterministic CPU-based training.
Reproducible Results: Comprehensive seed setting and environment variable control.
MLflow Integration: Pre-trained model loading and versioned model registration.
Modern Tooling: Uses uv for package management, hatchling for builds, ruff for linting/formatting, and Makefile for task automation.
src Layout: Follows standard Python project structure.

🏗 Project Architecture

The system employs a multi-stage ensemble approach. Core Python code resides in the src/ directory. Key components include:

Data Ingestion & Preprocessing (src/utils, data/)
Feature Engineering (src/utils/advanced_goal_features.py)
Base Model Training (src/models/StackedEnsemble/base/ — includes XGBoost, TabNet, LightGBM, Random Forest, KNN)
Ensemble Logic (src/models/ensemble/)
Prediction Service (src/predictors/)
Backend API (src/backend/ - if applicable)
Documentation (docs/, mkdocs.yml)
Development Tools (devtools/, Makefile, pyproject.toml)

(See docs/architecture.md and docs/technical.md for more details).

🚀 Installation

Prerequisites

Python >=3.11,<4.0
Git
uv package manager (See uv installation)
make (Optional, for using the Makefile. See Makefile Setup Note)
Windows 11 (Tested on)

Setup

# Clone the repository (replace with your actual URL)
git clone https://github.com/ronyka77/TheDrawCode.git
cd TheDrawCode

# Create and activate virtual environment using uv
uv venv
# On Windows (cmd/powershell)
.venv\Scripts\activate
# On Linux/macOS
# source .venv/bin/activate

# Install dependencies using the Makefile (recommended)
make install
# OR install directly using uv
# uv sync --all-extras --dev

Environment Setup Flow

Makefile Setup Note

The Makefile provides convenient shortcuts. If make is not installed on your system (common on Windows), you can either install it (e.g., via Chocolatey: choco install make) or run the corresponding commands from the Makefile directly (e.g., run uv sync --all-extras --dev instead of make install).

Verification

Verify your installation by running the tests:

make test
# OR directly with uv
# uv run pytest

📊 Usage

Basic Prediction Example

# Ensure your virtual environment is activated
# Run scripts from the project root directory

# Example assumes prepare_data function exists and loads data appropriately
# Note the 'src.' prefix due to the src-layout

from src.models.ensemble.ensemble_model import EnsembleModel # Adjust filename if needed
from src.utils.logger import ExperimentLogger
import pandas as pd

# Initialize logger
logger = ExperimentLogger(experiment_name="soccer_prediction")

# Load dataset (replace with your data loading)
# data = pd.read_csv("path/to/matches.csv")
# X_train, y_train, X_test, y_test = prepare_data(data)

# # Initialize and train ensemble model (example)
# model = EnsembleModel(
#     logger=logger,
#     meta_learner_type='lgb',
#     dynamic_weighting=True,
#     target_precision=0.50,
#     required_recall=0.25,
#     extra_base_model_type='random_forest'
# )

# # Train the model (example)
# # results = model.train(X_train, y_train, X_test, y_test)

# # Make predictions (example)
# # predictions = model.predict(X_test)
# # probabilities = model.predict_proba(X_test)

# # print(f"Optimized threshold: {model.optimal_threshold}")
print("Usage example needs actual data loading and training steps.")

Running Training via Script

# Run from project root
python -m src.models.ensemble.run_ensemble

Viewing Experiments

# Ensure MLflow server is configured or use local tracking
mlflow ui --port 5000 --backend-store-uri sqlite:///mlflow.db # Example using local SQLite

Navigate to http://localhost:5000 in your browser.

🚀 Development Workflow

Use the Makefile for common tasks:

make install: Install dependencies.
make lint: Run ruff check and format.
make test: Run pytest tests.
make clean: Remove cache and build artifacts.
make build: Build the package.

🧪 Model Pipeline

The system follows this workflow:

Data Preparation: Feature engineering and validation.

Base Model Loading: Loading pre-trained models from MLflow (from src/models/ensemble/ensemble_model.py).

# Example run IDs used by the system
xgb_run_id = '30402608b8dc4c899d675e5b56c48c01'
# ... other model run IDs ...
# knn_run_id = 'your_knn_model_run_id'

Dynamic Weighting: Calculating weights (src/models/ensemble/weights.py).
Meta-Feature Creation.
Meta-Learner Training.
Threshold Optimization (src/models/ensemble/thresholds.py).
Model Registration: Registering the final model.

🔄 Model Flow Diagram

⚙️ Configuration

Configuration is managed via:

pyproject.toml: Project metadata, dependencies, build settings, tool configurations (ruff, pytest).
Environment Variables: For reproducibility and runtime settings (see Installation section).
Model Parameters: Passed during EnsembleModel initialization or via run_ensemble script arguments.

(See docs/technical.md for details on specific settings like reproducibility seeds and base model parameters).

🧩 Extending the Model

Adding New Base Models

Implement the model in src/models/StackedEnsemble/base/ (see KNN as an example).
Add the model type to extra_base_model_type options in src/models/ensemble/ensemble_model.py.
Update the load_models_from_mlflow method in src/models/ensemble/ensemble_model.py.
Register a new MLflow run ID for your trained model.

🔧 Troubleshooting

Common Issues

Import Errors (ModuleNotFoundError): Ensure you run scripts from the project root directory (TheDrawCode) or have installed the package correctly (make install or uv pip install -e .). Verify the src layout is correct.
make not found: See Makefile Setup Note.
MLflow Model Loading Errors: Check model registration and signatures in MLflow.
TensorFlow Numerical Differences: Ensure TF_ENABLE_ONEDNN_OPTS=0 is set.
Memory Issues: Reduce batch sizes or feature counts.
TabNet CPU Core Configuration: Ensure threading environment variables are set and PyTorch threads are configured if issues persist.

(See docs/technical.md for more details on specific configurations).

👥 Contributing

Contributions are welcome!

Fork the repository.
Create your feature branch (git checkout -b feature/your-feature).
Make your changes.
Ensure code quality: Run make lint and make test.
Commit your changes (git commit -am 'Add some feature').
Push to the branch (git push origin feature/your-feature).
Open a Pull Request.

Please adhere to PEP 8, include docstrings/comments, add tests, and use type hints.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

For more detailed documentation, run mkdocs serve and view the site locally, or refer to the files in the docs/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
data		data
docs		docs
frontend		frontend
src		src
.env_sample		.env_sample
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
Templates.txt		Templates.txt
environment.yml		environment.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
readme.md		readme.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🥅 Soccer Prediction Project v2.3

📋 Table of Contents

✨ Key Features

🏗 Project Architecture

🚀 Installation

Prerequisites

Setup

Environment Setup Flow

Makefile Setup Note

Verification

📊 Usage

Basic Prediction Example

Running Training via Script

Viewing Experiments

🚀 Development Workflow

🧪 Model Pipeline

🔄 Model Flow Diagram

⚙️ Configuration

🧩 Extending the Model

Adding New Base Models

🔧 Troubleshooting

Common Issues

👥 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ronyka77/TheDrawCode

Folders and files

Latest commit

History

Repository files navigation

🥅 Soccer Prediction Project v2.3

📋 Table of Contents

✨ Key Features

🏗 Project Architecture

🚀 Installation

Prerequisites

Setup

Environment Setup Flow

Makefile Setup Note

Verification

📊 Usage

Basic Prediction Example

Running Training via Script

Viewing Experiments

🚀 Development Workflow

🧪 Model Pipeline

🔄 Model Flow Diagram

⚙️ Configuration

🧩 Extending the Model

Adding New Base Models

🔧 Troubleshooting

Common Issues

👥 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages