Skip to content

Conversation

@cakedev0
Copy link
Owner

Reference Issues/PRs

Follow-up from scikit-learn#32911

What does this implement/fix? Explain your changes.

  • Implemented loss.fit_intercept_only_by_idx for AE, Pinball and Huber losses.
  • All those methods are based on _weighted_quantile_by_idx that I implemented in sklearn/utils/stats.py, alongside _weighted_percentile: for now it lacks the option average=True/False.
  • Used loss.fit_intercept_only_by_idx instead of the O(n^2) loop pattern in _update_terminal_regions for those losses

I looked into using this in HGB too, but I don't think it'll be useful.

AI usage disclosure

Almost fully generated by Copilot. At first glance, it looks fairly good honestly.

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Seed-up

Good: 140s -> 4s on this "highest-speedup" example:

import numpy as np
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor

n, d = 100_000, 4
X = np.random.geometric(0.05, size=(n, d))
y = np.random.rand(n)
gb = GradientBoostingRegressor(n_estimators=10, max_depth=None, loss='huber')
gb.fit(X, y)

For AE, the speed-up is 60s -> 4s.

Is the increased code complexity worth the speed-up (for what is mostly an edge-case)?

I don't know.

The new code from this PR is a relatively nice piece code, and could maybe be used in other places. But it's quite a lot of code too...

@github-actions
Copy link

github-actions bot commented Jan 15, 2026

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


mypy

mypy detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed mypy version is mypy=1.15.0.

Details

sklearn/utils/tests/test_stats.py:15: error: Module "sklearn.utils.stats" has no attribute "weighted_quantile_by_idx"; maybe "_weighted_quantile_by_idx"?  [attr-defined]
Found 1 error in 1 file (checked 569 source files)

Generated for commit: 490036e. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants