Feat/update gradientboosting #24

eschmidt42 · 2025-08-18T11:08:16Z

This pull request introduces significant improvements and refactoring to the gradient boosted trees implementation, focusing on correctness, maintainability, and extensibility. The key changes include a refactor of the loss and gradient calculation logic for both regression and classification, the addition of optimal step size (line search) for each boosting iteration, and the introduction of comprehensive unit tests for the new utility functions. There are also improvements to data validation and utility function usage.

Gradient Boosted Trees Algorithm Refactor and Enhancements:

Refactored the loss and gradient (pseudo-residual) calculation logic for both regression and classification into standalone functions: get_pseudo_residual_mse, get_pseudo_residual_log_odds, get_start_estimate_mse, and get_start_estimate_log_odds, improving code clarity and reusability. [1] [2]
Implemented optimal step size (line search) for each boosting iteration via the new find_step_size function, replacing the previous fixed factor approach, and now storing per-tree step sizes in step_sizes_. [1] [2]
Updated the fit and predict logic in both regressor and classifier to use the new residuals and step size logic, improving correctness and aligning with standard gradient boosting algorithms. [1] [2]

Utility and API Improvements:

Replaced the old bool_to_float utility with vectorize_bool_to_float for efficient label mapping in classification. [1] [2]
Improved data validation by ensuring ensure_all_finite is consistently respected in fit and predict methods. [1] [2]
Added get_probabilities_from_mapped_bools for consistent probability output in classification. [1] [2]

Testing and Dependency Updates:

Added comprehensive unit tests for the new utility functions in tests/models/test_gradientboostedtrees.py, covering edge cases and ensuring correctness of pseudo-residual and starting estimate calculations. [1] [2]
Added scipy as a dependency for the new optimization routines.
Imported minimize_scalar from scipy.optimize to support line search in boosting.

These changes collectively improve the flexibility, correctness, and maintainability of the gradient boosted trees models, and provide a solid foundation for further enhancements.

…ting step sizes

…rove readability and usage of line search for the step size instead of self.factor

… regressor and added unit tests

eschmidt42 added 4 commits August 18, 2025 11:26

feat: minor refactor

85d9cbc

feat: added scipy as a dependency to do line search for gradient boos…

a13bc2c

…ting step sizes

feat: updated the gradient boosting trees regression algorithm to imp…

033c464

…rove readability and usage of line search for the step size instead of self.factor

feat: adjusted gradient boosted classifier to the same pattern as the…

afb26aa

… regressor and added unit tests

eschmidt42 merged commit 9dee0cd into main Aug 18, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/update gradientboosting #24

Feat/update gradientboosting #24

Uh oh!

eschmidt42 commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/update gradientboosting #24

Feat/update gradientboosting #24

Uh oh!

Conversation

eschmidt42 commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants