Skip to content

Comments

feat: Validation Split & MLflow Tracking (fixes #22)#78

Open
verdhanyash wants to merge 1 commit intoetsi-ai:mainfrom
verdhanyash:feat/validation-split-mlflow-tracking
Open

feat: Validation Split & MLflow Tracking (fixes #22)#78
verdhanyash wants to merge 1 commit intoetsi-ai:mainfrom
verdhanyash:feat/validation-split-mlflow-tracking

Conversation

@verdhanyash
Copy link
Contributor

@verdhanyash verdhanyash commented Feb 17, 2026

Fixes

Closes: #22

Type of Change

  • Bug fix
  • New feature
  • Documentation / Refactor
  • Math / Logic correction

Description

Implemented validation split and MLflow tracking for monitoring model generalization performance.

Changes:

  • Added validation_split: float = 0.2 parameter to Model.train() method
  • Split data into training and validation sets after preprocessing (shuffled, seed-reproducible)
  • Compute validation loss per-epoch inside the existing Rust progress_callbacktraining loop stays entirely in Rust, preserving optimizer state and avoiding FFI overhead
  • Added _calculate_validation_loss() method supporting both classification (cross-entropy) and regression (MSE)
  • Extended Rust bindings with forward() method to expose raw model outputs for validation loss calculation
  • Updated save_model() to log both loss and val_loss metrics to MLflow with epoch steps
  • Added input validation for validation_split parameter range
  • Stored val_loss_history on model instance, initialized in __init__() and load()

Key architectural decisions (addressing PR #62 feedback):

  • Training loop remains 100% in Rustrust_model.train() called exactly once
  • Optimizer state (Adam moments, SGD) is never reset between epochs
  • Validation loss computed in the existing progress_callback (no extra FFI overhead)
  • Data shuffled before split using np.random.default_rng(seed) for reproducibility

Result: MLflow dashboard now displays overlapping train/validation loss curves for monitoring overfitting and generalization.

How Has This Been Tested?

  • Unit Tests: Created and ran test_validation_split.py with 8 tests covering classification, regression, edge cases, seed reproducibility, and MLflow logging
  • Smoke Testing: Verified end-to-end with mock datasets — progress bar shows both Train Loss and Val Loss
  • Integration: Confirmed Rust core forward() method works correctly with Python API
  • Existing Tests: All 46 pytest tests pass (38 existing + 8 new, zero regressions)

Screenshots / Logs

image image image

Contribution Context

  • I am contributing through the SWOC program.

@github-actions
Copy link

Thank you for opening this PR! Our automated system is currently verifying the PR requirements.
Internal Discussion: Discord

@github-actions github-actions bot added the SWoC26 Contributions specifically for the Social Winter of Code program. label Feb 17, 2026
@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

@github-actions github-actions bot added area: python-api The user-facing Python library, Model class, and top-level CLI. feature New ML layers or API functionality. Medium Requires good codebase knowledge. labels Feb 17, 2026
@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

5 similar comments
@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

@github-actions
Copy link

Validation Successful!

This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes.

@Aamod007
Copy link
Collaborator

Good Work @verdhanyash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: python-api The user-facing Python library, Model class, and top-level CLI. feature New ML layers or API functionality. Medium Requires good codebase knowledge. SWoC26 Contributions specifically for the Social Winter of Code program.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Validation Split & MLflow Tracking

2 participants