feat: Validation Split & MLflow Tracking#62
Conversation
Manual Action Required!This pull request is not yet linked to an issue. Please update the description to include 'Closes: #issue_number' in the appropriate section. This is required to pass validation and initiate metadata synchronization. |
|
Thank you for opening this PR! Our automated system is currently verifying the PR requirements. |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
|
heyy. CI is failing on test_model_save_load_preserves_preprocessing. |
Thanks @Satyamgupta2365 for review. Will keep this in mind. currently the work is in progress. |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
|
Hii @Satyamgupta2365 finished the PR from my side and is ready for review. Thanks |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
There was a problem hiding this comment.
@SK8-infi Thank you for the effort! While validation tracking is a great addition, this PR introduces critical architectural issues that must be addressed:
- Optimizer State Reset: Moving the training loop to Python causes a new Adam/SGD instance to be created every epoch. This resets Adam's moment estimates and time-step to zero, breaking its adaptive logic.
- Performance Overhead: Calling the Rust backend once per epoch significantly increases FFI marshaling and context-switching overhead.
- Regressions: The code lacks the
batch_sizeparameter recently merged in PR #63, which will cause build and runtime failures.
Requested Changes:
- Revert the Python Loop: Move the validation logic into the Rust
trainmethod so the optimizer remains persistent. - Sync with Main: Update signatures to include the mandatory
batch_sizeparameter. - Preserve
forward(): Keep this new method inlib.rsas it is a valuable utility.
|
@SK8-infi Please update us on the status. You have 24 hours to address the issues or we’ll close this PR and reassign the task to keep development on track. |
… into feat/val-split-track
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
|
had to do some changes in tests as the new signature is (train_losses, val_losses) |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
|
@SK8-infi Build tests are failing. |
|
Seems to be an issue happened during resolving conflicts. Looking into it |
|
@SK8-infi There are currently no merge conflicts. However, all build tests must pass prior to review. Please address the issues. |
|
Okay. Ensuring to be completed by EOD |
|
@SK8-infi any updates for PR ? |
Validation Successful!This pull request has been verified and linked to issue #22. The system is now synchronizing metadata from the referenced issue. Kindly await maintainer review of your changes. |
|
Hii |
Aamod007
left a comment
There was a problem hiding this comment.
@SK8-infi Critical Fixes (MUST DO):
- Fix parameter order in Python API call
- Fix hidden layer type mismatch
- Add integration tests with real Rust extension
Recommended Improvements (SHOULD DO):
4. Add data shuffling before split
5. Add random seed parameter for reproducibility
6. Add stratification option for classification
Future Enhancements (NICE TO HAVE):
- K-fold cross-validation support
- Early stopping based on validation loss
- Validation metrics beyond loss
- Learning rate scheduling
|
@SK8-infi, please update us on the status. You have 24 hours to address the issues or we’ll close this PR and reassign the task to keep development on track. |
|
Hii @debug-soham |
Fixes
Closes: #22
Type of Change
Description
Implemented validation split and MLflow tracking for monitoring model generalization performance.
Changes:
validation_split: float = 0.2parameter toModel.train()method_calculate_validation_loss()method supporting both classification (cross-entropy) and regression (MSE)forward()method to expose raw model outputs for validation loss calculationsave_model()to log bothlossandval_lossmetrics to MLflow with epoch stepsResult: MLflow dashboard now displays overlapping train/validation loss curves for monitoring overfitting and generalization.
How Has This Been Tested?
test_validation_split.pycovering classification, regression, and edge casesforward()method works correctly with Python APIScreenshots / Logs
Contribution Context