Routine to identify the generalised Cauchy point#184
Routine to identify the generalised Cauchy point#184johannahaffner merged 11 commits intopatrick-kidger:devfrom
Conversation
|
Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :) |
I haven't had a chance to look at this yet - but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s) (with an verifying assist from GPT-5 Codex):
So as you suggest here (#143 (comment)), the care that should be taken for, at least L-BFGS-B in particular, is to have a linesearch (even a zoom-y one!) that does respect the bounds. |
No pressure! Whenever you have the time.
Thank you! This is very helpful. A missing clip in one branch of the line search implementation is exactly the type of bug that is very likely to occur in such an implementation. Since you write that our Zoom line search does all the things theirs does, except for ensuring that the bounds are respected, it sounds like we have fairly easy modifications to make on our end. I think a principle I would implement across our code base is that constrained descents only ever return a feasible step to begin with, and that searches that may return step sizes greater than The other type of easily introduced bugs concerns the Cauchy point routine itself, which has a number of edge cases that need to be handled correctly, depending on where we are when invoking the routine (at the boundary or in the interior), and where the gradient points, or whether it is zero. For that reason, I made this a separate PR and added a bunch of test cases. Bugs of this nature might also be present in their implementation, but there is no way of knowing without testing their routine separately. |
|
Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :) At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine. |
Ah I see! In that case I would prefer to truncate the step length, clipping can alter the direction quite substantially depending on where one is with respect to the bounds. I don't see that playing well with solvers that iteratively build a Hessian approximation.
This may mean that for some steps the Wolfe condition does not hold, especially if we are close to the boundary. But I guess in that case that is simply a price to be paid. With respect to the concrete implementation - in my current development branch for constrained solves, I write the bounds into the |
82c5077 to
32a4de1
Compare
|
I've added explanations to the test cases and added an extra one - this does the right thing + I will merge it. |
* first pass at refactoring cauchy_point function, so far untested * add a pretty drawing * minor tweaks * refactored cauchy point finding function * add clarifying comment * adding test cases * bugfix: pick correct next intercept in the presence of infinite values * limit step length to a full gradient step * add expected results to cauchy point test cases * add clarifying comment: Hessian operator is assumed to be positive definite. * add explanations for test cases --------- Co-authored-by: Johanna Haffner <johanna.haffner@bsse.ethz.ch>
This adds a small module with a routine that identifies the generalised Cauchy point, the first local minimiser along the piecewise linear path defined by the gradient, projected on the box defined by bound constraints.
This is an essential ingredient in
BFGS-BandL-BFGS-B, and implemented as described in the original publication: https://epubs.siam.org/doi/10.1137/0916069As with many constrained methods, there are lots of edge cases to think of + get right here, so I added a bunch of tests to cycle through the ones I could think of.
I think that the
cauchy_pointmodule can be merged intomain: I would not add it to the public API, and the change is purely additive.