feat: polished xgboost.py #25
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request refactors and centralizes key gradient and transformation utilities for gradient boosting and XGBoost models, improving code reuse, modularity, and maintainability. It moves gradient calculation functions and transformation utilities into dedicated modules, updates imports throughout the codebase, and removes redundant or duplicate code. Additionally, it updates the fitting and prediction logic in model classes to use the new shared utilities, and cleans up related tests.
Refactoring and code organization:
get_pseudo_residual_mse,get_pseudo_residual_log_odds,get_start_estimate_mse,get_start_estimate_log_odds, andcheck_y_float) fromgradientboostedtrees.pyandxgboost.pyinto a new modulegradient.py, updating all model classes to import and use these functions. [1] [2] [3]vectorize_bool_to_float,get_probabilities_from_mapped_bools) into a newtransform.pymodule, and updated all relevant imports. [1] [2] [3]gradientboostedtrees.py,xgboost.py, andutils.py. [1] [2] [3] [4] [5]Model logic improvements:
GradientBoostedTreesRegressor,GradientBoostedTreesClassifier,XGBoostRegressor, andXGBoostClassifierto use the new centralized gradient and transformation utilities, including proper handling of first and second derivatives where appropriate. [1] [2] [3] [4] [5] [6]ensure_all_finiteparameter in data validation across all relevant model methods. [1] [2] [3] [4]Test cleanup:
test_gradientboostedtrees.py, as these are now covered by the new centralized implementations.These changes improve code clarity, reduce duplication, and make it easier to maintain and extend the gradient boosting and XGBoost implementations.