Skip to content

Routine to identify the generalised Cauchy point#184

Merged
johannahaffner merged 11 commits intopatrick-kidger:devfrom
johannahaffner:cauchy-point
Nov 29, 2025
Merged

Routine to identify the generalised Cauchy point#184
johannahaffner merged 11 commits intopatrick-kidger:devfrom
johannahaffner:cauchy-point

Conversation

@johannahaffner
Copy link
Collaborator

@johannahaffner johannahaffner commented Nov 1, 2025

This adds a small module with a routine that identifies the generalised Cauchy point, the first local minimiser along the piecewise linear path defined by the gradient, projected on the box defined by bound constraints.

This is an essential ingredient in BFGS-B and L-BFGS-B, and implemented as described in the original publication: https://epubs.siam.org/doi/10.1137/0916069
As with many constrained methods, there are lots of edge cases to think of + get right here, so I added a bunch of tests to cycle through the ones I could think of.

I think that the cauchy_point module can be merged into main: I would not add it to the public API, and the change is purely additive.

@johannahaffner
Copy link
Collaborator Author

Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for BFGS-B and L-BFGS-B. I'm pretty confident that it does the right thing, but another set of eyes would be great!

You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :)

@cjchristopher
Copy link

cjchristopher commented Nov 7, 2025

Tagging @cjchristopher - this is the routine to identify the generalised Cauchy point, required for BFGS-B and L-BFGS-B. I'm pretty confident that it does the right thing, but another set of eyes would be great!

You're right about the strong Wolfe condition in the line search proposed in the Byrd et al. paper. I haven't checked how they modify the line search to ensure that it stays in the feasible region, though. Right now just trying to divide and conquer the pile of code I already wrote to make constrained optimisation happen in Optimistix :)

I haven't had a chance to look at this yet - but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s) (with an verifying assist from GPT-5 Codex):

  1. Here, in the linesearch branch of the l-bfgs-b update routine: https://github.com/google/jaxopt/blob/cf28b4563f5ad9354b76433622dbb9ee32af5f09/jaxopt/_src/lbfgsb.py#L469-L493 , there is no clipping of the parameters back in bound like there is in the else branch... this would be fine if the linesearch respected the bounds, but of course, the zoom linesearch is not implemented to respect the bounds provided to L-BFGS-B, and so I think this is where the problems occur.....because:
  2. I am not sure why the jaxopt team opted to use that implementation of Zoom for a bounded optimisation routine, actually - but looking again the linesearch suggested in the original L-BFGS-B (More & Thuente), that search does indeed take the bounds as parameters (https://github.com/antoinecollet5/lbfgsb/blob/master/lbfgsb/linesearch.py - note the ub and lb parameters) and uses them to ensure that feasibility is respected.

So as you suggest here (#143 (comment)), the care that should be taken for, at least L-BFGS-B in particular, is to have a linesearch (even a zoom-y one!) that does respect the bounds.
Edit: Relevantly, I can't see what the Zoom linesearch we have (and subsequently the one is optax and jaxopt) is doing that the More & Thuente algorithm doesn't do, except for respect additional bounds when provided.

@johannahaffner
Copy link
Collaborator Author

I haven't had a chance to look at this yet -

No pressure! Whenever you have the time.

but I've had eyes on the jaxopt L-BFGS-B and Zoom again and I think I've identified the problem(s)

Thank you! This is very helpful.

A missing clip in one branch of the line search implementation is exactly the type of bug that is very likely to occur in such an implementation. Since you write that our Zoom line search does all the things theirs does, except for ensuring that the bounds are respected, it sounds like we have fairly easy modifications to make on our end.

I think a principle I would implement across our code base is that constrained descents only ever return a feasible step to begin with, and that searches that may return step sizes greater than 1.0 optionally support truncation to feasible step sizes if bounds are present. The truncated step size can be obtained with tree_min(feasible_step_length), both functions that are already part of our misc module.


The other type of easily introduced bugs concerns the Cauchy point routine itself, which has a number of edge cases that need to be handled correctly, depending on where we are when invoking the routine (at the boundary or in the interior), and where the gradient points, or whether it is zero. For that reason, I made this a separate PR and added a bunch of test cases.

Bugs of this nature might also be present in their implementation, but there is no way of knowing without testing their routine separately.

@cjchristopher
Copy link

Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :)

At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine.

@johannahaffner
Copy link
Collaborator Author

Ah to clarify, that missing clip is in the l-bfgs-b implementation, after zoom is called and returned. I assume an eventual l-bfgs-b implementing for optimistix will correctly enforce bounds both in l-bfgs-b, and whatever linesearch it happens to use :)

Ah I see! In that case I would prefer to truncate the step length, clipping can alter the direction quite substantially depending on where one is with respect to the bounds. I don't see that playing well with solvers that iteratively build a Hessian approximation.

At some point I'll have a closer look at More & Thuente and see if the Zoom here can simply accept bounds optionally and enforce them without drastically changing the rest of the routine.

This may mean that for some steps the Wolfe condition does not hold, especially if we are close to the boundary. But I guess in that case that is simply a price to be paid.

With respect to the concrete implementation - in my current development branch for constrained solves, I write the bounds into the FunctionInfo object. This is one way to do it.

@johannahaffner johannahaffner changed the base branch from main to dev November 29, 2025 16:48
@johannahaffner
Copy link
Collaborator Author

johannahaffner commented Nov 29, 2025

I've added explanations to the test cases and added an extra one - this does the right thing + I will merge it.

@johannahaffner johannahaffner changed the base branch from dev to main November 29, 2025 17:13
@johannahaffner johannahaffner changed the base branch from main to dev November 29, 2025 17:14
@johannahaffner johannahaffner merged commit 04ad0d6 into patrick-kidger:dev Nov 29, 2025
2 checks passed
@johannahaffner johannahaffner deleted the cauchy-point branch November 29, 2025 17:16
patrick-kidger pushed a commit that referenced this pull request Feb 8, 2026
* first pass at refactoring cauchy_point function, so far untested

* add a pretty drawing

* minor tweaks

* refactored cauchy point finding function

* add clarifying comment

* adding test cases

* bugfix: pick correct next intercept in the presence of infinite values

* limit step length to a full gradient step

* add expected results to cauchy point test cases

* add clarifying comment: Hessian operator is assumed to be positive definite.

* add explanations for test cases

---------

Co-authored-by: Johanna Haffner <johanna.haffner@bsse.ethz.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants