Routine to identify the generalised Cauchy point by johannahaffner · Pull Request #184 · patrick-kidger/optimistix

johannahaffner · 2025-11-01T23:50:43Z

This adds a small module with a routine that identifies the generalised Cauchy point, the first local minimiser along the piecewise linear path defined by the gradient, projected on the box defined by bound constraints.

This is an essential ingredient in BFGS-B and L-BFGS-B, and implemented as described in the original publication: https://epubs.siam.org/doi/10.1137/0916069
As with many constrained methods, there are lots of edge cases to think of + get right here, so I added a bunch of tests to cycle through the ones I could think of.

I think that the cauchy_point module can be merged into main: I would not add it to the public API, and the change is purely additive.

johannahaffner · 2025-11-02T00:06:09Z

Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for BFGS-B and L-BFGS-B. I'm pretty confident that it does the right thing, but another set of eyes would be great!

You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :)

cjchristopher · 2025-11-07T08:18:25Z

Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for BFGS-B and L-BFGS-B. I'm pretty confident that it does the right thing, but another set of eyes would be great!

You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :)

I haven't had a chance to look at this yet - but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s) (with an verifying assist from GPT-5 Codex):

Here, in the linesearch branch of the l-bfgs-b update routine: https://github.com/google/jaxopt/blob/cf28b4563f5ad9354b76433622dbb9ee32af5f09/jaxopt/_src/lbfgsb.py#L469-L493 , there is no clipping of the parameters back in bound like there is in the else branch... this would be fine if the linesearch respected the bounds, but of course, the zoom linesearch is not implemented to respect the bounds provided to L-BFGS-B, and so I think this is where the problems occur.....because:
I am not sure why the jaxopt team opted to use that implementation of Zoom for a bounded optimisation routine, actually - but looking again the linesearch suggested in the original L-BFGS-B (More & Thuente), that search does indeed take the bounds as parameters (https://github.com/antoinecollet5/lbfgsb/blob/master/lbfgsb/linesearch.py - note the ub and lb parameters) and uses them to ensure that feasibility is respected.

So as you suggest here (#143 (comment)), the care that should be taken for, at least L-BFGS-B in particular, is to have a linesearch (even a zoom-y one!) that does respect the bounds.
Edit: Relevantly, I can't see what the Zoom linesearch we have (and subsequently the one is optax and jaxopt) is doing that the More & Thuente algorithm doesn't do, except for respect additional bounds when provided.

johannahaffner · 2025-11-07T13:19:32Z

I haven't had a chance to look at this yet -

No pressure! Whenever you have the time.

but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s)

Thank you! This is very helpful.

A missing clip in one branch of the line search implementation is exactly the type of bug that is very likely to occur in such an implementation. Since you write that our Zoom line search does all the things theirs does, except for ensuring that the bounds are respected, it sounds like we have fairly easy modifications to make on our end.

I think a principle I would implement across our code base is that constrained descents only ever return a feasible step to begin with, and that searches that may return step sizes greater than 1.0 optionally support truncation to feasible step sizes if bounds are present. The truncated step size can be obtained with tree_min(feasible_step_length), both functions that are already part of our misc module.

The other type of easily introduced bugs concerns the Cauchy point routine itself, which has a number of edge cases that need to be handled correctly, depending on where we are when invoking the routine (at the boundary or in the interior), and where the gradient points, or whether it is zero. For that reason, I made this a separate PR and added a bunch of test cases.

Bugs of this nature might also be present in their implementation, but there is no way of knowing without testing their routine separately.

cjchristopher · 2025-11-07T15:39:04Z

Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :)

At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine.

johannahaffner · 2025-11-07T17:30:08Z

Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :)

Ah I see! In that case I would prefer to truncate the step length, clipping can alter the direction quite substantially depending on where one is with respect to the bounds. I don't see that playing well with solvers that iteratively build a Hessian approximation.

At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine.

This may mean that for some steps the Wolfe condition does not hold, especially if we are close to the boundary. But I guess in that case that is simply a price to be paid.

With respect to the concrete implementation - in my current development branch for constrained solves, I write the bounds into the FunctionInfo object. This is one way to do it.

…finite.

johannahaffner · 2025-11-29T16:53:58Z

I've added explanations to the test cases and added an extra one - this does the right thing + I will merge it.

* first pass at refactoring cauchy_point function, so far untested * add a pretty drawing * minor tweaks * refactored cauchy point finding function * add clarifying comment * adding test cases * bugfix: pick correct next intercept in the presence of infinite values * limit step length to a full gradient step * add expected results to cauchy point test cases * add clarifying comment: Hessian operator is assumed to be positive definite. * add explanations for test cases --------- Co-authored-by: Johanna Haffner <johanna.haffner@bsse.ethz.ch>

Johanna Haffner added 10 commits November 29, 2025 17:01

first pass at refactoring cauchy_point function, so far untested

386c920

add a pretty drawing

6d51775

minor tweaks

786b1cb

refactored cauchy point finding function

a8ca729

add clarifying comment

60eeade

adding test cases

30f06af

bugfix: pick correct next intercept in the presence of infinite values

59d1288

limit step length to a full gradient step

43631d4

add expected results to cauchy point test cases

26dd82e

add clarifying comment: Hessian operator is assumed to be positive de…

32a4de1

…finite.

johannahaffner force-pushed the cauchy-point branch from 82c5077 to 32a4de1 Compare November 29, 2025 16:01

add explanations for test cases

1304b79

johannahaffner changed the base branch from main to dev November 29, 2025 16:48

johannahaffner changed the base branch from dev to main November 29, 2025 17:13

johannahaffner changed the base branch from main to dev November 29, 2025 17:14

johannahaffner merged commit 04ad0d6 into patrick-kidger:dev Nov 29, 2025
2 checks passed

johannahaffner deleted the cauchy-point branch November 29, 2025 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Routine to identify the generalised Cauchy point#184

Routine to identify the generalised Cauchy point#184
johannahaffner merged 11 commits intopatrick-kidger:devfrom
johannahaffner:cauchy-point

johannahaffner commented Nov 1, 2025 •

edited

Loading

Uh oh!

johannahaffner commented Nov 2, 2025

Uh oh!

cjchristopher commented Nov 7, 2025 •

edited

Loading

Uh oh!

johannahaffner commented Nov 7, 2025

Uh oh!

cjchristopher commented Nov 7, 2025

Uh oh!

johannahaffner commented Nov 7, 2025

Uh oh!

johannahaffner commented Nov 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johannahaffner commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannahaffner commented Nov 2, 2025

Uh oh!

cjchristopher commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannahaffner commented Nov 7, 2025

Uh oh!

cjchristopher commented Nov 7, 2025

Uh oh!

johannahaffner commented Nov 7, 2025

Uh oh!

johannahaffner commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johannahaffner commented Nov 1, 2025 •

edited

Loading

cjchristopher commented Nov 7, 2025 •

edited

Loading

johannahaffner commented Nov 29, 2025 •

edited

Loading