Skip to content

Conversation

@eschmidt42
Copy link
Owner

This pull request refactors and centralizes key gradient and transformation utilities for gradient boosting and XGBoost models, improving code reuse, modularity, and maintainability. It moves gradient calculation functions and transformation utilities into dedicated modules, updates imports throughout the codebase, and removes redundant or duplicate code. Additionally, it updates the fitting and prediction logic in model classes to use the new shared utilities, and cleans up related tests.

Refactoring and code organization:

  • Moved gradient-related functions (get_pseudo_residual_mse, get_pseudo_residual_log_odds, get_start_estimate_mse, get_start_estimate_log_odds, and check_y_float) from gradientboostedtrees.py and xgboost.py into a new module gradient.py, updating all model classes to import and use these functions. [1] [2] [3]
  • Moved transformation utilities (vectorize_bool_to_float, get_probabilities_from_mapped_bools) into a new transform.py module, and updated all relevant imports. [1] [2] [3]
  • Removed now-redundant implementations of gradient and transformation functions from gradientboostedtrees.py, xgboost.py, and utils.py. [1] [2] [3] [4] [5]

Model logic improvements:

  • Updated the fit and predict methods in GradientBoostedTreesRegressor, GradientBoostedTreesClassifier, XGBoostRegressor, and XGBoostClassifier to use the new centralized gradient and transformation utilities, including proper handling of first and second derivatives where appropriate. [1] [2] [3] [4] [5] [6]
  • Ensured consistent use of the ensure_all_finite parameter in data validation across all relevant model methods. [1] [2] [3] [4]

Test cleanup:

  • Removed redundant or now-unnecessary tests for gradient functions from test_gradientboostedtrees.py, as these are now covered by the new centralized implementations.

These changes improve code clarity, reduce duplication, and make it easier to maintain and extend the gradient boosting and XGBoost implementations.

…tedtrees.py, also got rid of some redundant code that both modules should share, re-located re-used functions to gradient.py and transform.py, added tests
@eschmidt42 eschmidt42 merged commit 9ff0496 into main Aug 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants