PRF: Optimize GB line search regression (RFC: is it worth it?) #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Follow-up from scikit-learn#32911
What does this implement/fix? Explain your changes.
loss.fit_intercept_only_by_idxfor AE, Pinball and Huber losses._weighted_quantile_by_idxthat I implemented insklearn/utils/stats.py, alongside_weighted_percentile: for now it lacks the optionaverage=True/False.loss.fit_intercept_only_by_idxinstead of the O(n^2) loop pattern in_update_terminal_regionsfor those lossesI looked into using this in HGB too, but I don't think it'll be useful.
AI usage disclosure
Almost fully generated by Copilot. At first glance, it looks fairly good honestly.
I used AI assistance for:
Seed-up
Good: 140s -> 4s on this "highest-speedup" example:
For AE, the speed-up is 60s -> 4s.
Is the increased code complexity worth the speed-up (for what is mostly an edge-case)?
I don't know.
The new code from this PR is a relatively nice piece code, and could maybe be used in other places. But it's quite a lot of code too...