PRF: Optimize GB line search regression (RFC: is it worth it?) #4

cakedev0 · 2026-01-15T21:51:13Z

Reference Issues/PRs

Follow-up from scikit-learn#32911

What does this implement/fix? Explain your changes.

Implemented loss.fit_intercept_only_by_idx for AE, Pinball and Huber losses.
All those methods are based on _weighted_quantile_by_idx that I implemented in sklearn/utils/stats.py, alongside _weighted_percentile: for now it lacks the option average=True/False.
Used loss.fit_intercept_only_by_idx instead of the O(n^2) loop pattern in _update_terminal_regions for those losses

I looked into using this in HGB too, but I don't think it'll be useful.

AI usage disclosure

Almost fully generated by Copilot. At first glance, it looks fairly good honestly.

I used AI assistance for:

Code generation (e.g., when writing an implementation or fixing a bug)
Test/benchmark generation
Documentation (including examples)
Research and understanding

Seed-up

Good: 140s -> 4s on this "highest-speedup" example:

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor

n, d = 100_000, 4
X = np.random.geometric(0.05, size=(n, d))
y = np.random.rand(n)
gb = GradientBoostingRegressor(n_estimators=10, max_depth=None, loss='huber')
gb.fit(X, y)

For AE, the speed-up is 60s -> 4s.

Is the increased code complexity worth the speed-up (for what is mostly an edge-case)?

I don't know.

The new code from this PR is a relatively nice piece code, and could maybe be used in other places. But it's quite a lot of code too...

…ch_regression

…search_regression

github-actions · 2026-01-15T21:52:52Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`mypy`

mypy detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed mypy version is mypy=1.15.0.

Details


sklearn/utils/tests/test_stats.py:15: error: Module "sklearn.utils.stats" has no attribute "weighted_quantile_by_idx"; maybe "_weighted_quantile_by_idx"?  [attr-defined]
Found 1 error in 1 file (checked 569 source files)

_{Generated for commit: 490036e. Link to the linter CI: here}

cakedev0 added 4 commits January 15, 2026 22:00

PRF: optim GB linesearch with by idx calculations for regression losses

40a656d

Merge remote-tracking branch 'upstream/main' into optim/gbt_line_sear…

2e37135

…ch_regression

renaming

9787150

Merge branch 'optim/gbt_update_terminal_regions' into optim/gbt_line_…

490036e

…search_regression

github-actions bot added the CI:Linter failure label Jan 15, 2026

cakedev0 mentioned this pull request Jan 15, 2026

PERF: GradientBoosting* optimize line-search from O(n * n_leaves) to O(n) for classification losses scikit-learn/scikit-learn#32911

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRF: Optimize GB line search regression (RFC: is it worth it?) #4

PRF: Optimize GB line search regression (RFC: is it worth it?) #4

Uh oh!

cakedev0 commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PRF: Optimize GB line search regression (RFC: is it worth it?) #4

Are you sure you want to change the base?

PRF: Optimize GB line search regression (RFC: is it worth it?) #4

Uh oh!

Conversation

cakedev0 commented Jan 15, 2026

Reference Issues/PRs

What does this implement/fix? Explain your changes.

AI usage disclosure

Seed-up

Is the increased code complexity worth the speed-up (for what is mostly an edge-case)?

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Linting issues

mypy

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 15, 2026 •

edited

Loading

`mypy`