Feat/update gradientboosting #24
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant improvements and refactoring to the gradient boosted trees implementation, focusing on correctness, maintainability, and extensibility. The key changes include a refactor of the loss and gradient calculation logic for both regression and classification, the addition of optimal step size (line search) for each boosting iteration, and the introduction of comprehensive unit tests for the new utility functions. There are also improvements to data validation and utility function usage.
Gradient Boosted Trees Algorithm Refactor and Enhancements:
get_pseudo_residual_mse,get_pseudo_residual_log_odds,get_start_estimate_mse, andget_start_estimate_log_odds, improving code clarity and reusability. [1] [2]find_step_sizefunction, replacing the previous fixed factor approach, and now storing per-tree step sizes instep_sizes_. [1] [2]Utility and API Improvements:
bool_to_floatutility withvectorize_bool_to_floatfor efficient label mapping in classification. [1] [2]ensure_all_finiteis consistently respected infitandpredictmethods. [1] [2]get_probabilities_from_mapped_boolsfor consistent probability output in classification. [1] [2]Testing and Dependency Updates:
tests/models/test_gradientboostedtrees.py, covering edge cases and ensuring correctness of pseudo-residual and starting estimate calculations. [1] [2]scipyas a dependency for the new optimization routines.minimize_scalarfromscipy.optimizeto support line search in boosting.These changes collectively improve the flexibility, correctness, and maintainability of the gradient boosted trees models, and provide a solid foundation for further enhancements.