Conversation
MichaelClerx
left a comment
There was a problem hiding this comment.
Thanks @alisterde
This looks BIG! I've added some initial comments on coding and style, but I'll need some more explanation before I can review what the code actually does
| @@ -0,0 +1,899 @@ | |||
| # | |||
| # Fixed learning-rate gradient descent. | |||
There was a problem hiding this comment.
| # Fixed learning-rate gradient descent. | |
| # Hager-Zhang line search. |
| from numpy.linalg import norm | ||
|
|
||
|
|
||
| class HagerZhang(pints.Optimiser): |
There was a problem hiding this comment.
Best to have this consistent with the file name, so either HagerZhangLineSearch or _hager_zhang.py
|
|
||
| class HagerZhang(pints.Optimiser): | ||
| """ | ||
| Gradient-descent method with a fixed learning rate. |
There was a problem hiding this comment.
Needs a docstring with references etc.
There was a problem hiding this comment.
Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).
| from numpy.linalg import norm | ||
|
|
||
|
|
||
| class HagerZhang(pints.Optimiser): |
There was a problem hiding this comment.
Do we need an interface for line search methods? E.g. class HagerZhang(pints.LineSearch) with class LineSearch(pints.Optimiser) ?
| def __init__(self, x0, sigma0=None, boundaries=None): | ||
| super(HagerZhang, self).__init__(x0, sigma0, boundaries) | ||
|
|
||
| # Set optimiser state |
There was a problem hiding this comment.
I take it this method only works for 1d problems. Should there be a check in the constructor for this? Or in a LineSearch class that this one extends?
| # the opposite slope condition (see function definition) | ||
| self.theta = 0.5 | ||
|
|
||
| self.__gamma = 0.66 |
There was a problem hiding this comment.
We're not using __ anywhere in PINTS, just _ to mark this as private
|
|
||
| def __initialising(self, k, alpha_k0): | ||
| ''' | ||
| This function is part of the Hager-Zhang line search method [1]. |
There was a problem hiding this comment.
This line's not really needed!
| self._current_f = np.asarray(proposed_f) | ||
| self._current_dfdx = np.asarray(proposed_dfdx) | ||
|
|
||
| elif self.wolfe_check is True: |
There was a problem hiding this comment.
| elif self.wolfe_check is True: | |
| elif self.wolfe_check: |
| """ | ||
| Gradient-descent method with a fixed learning rate. | ||
| """ | ||
|
|
There was a problem hiding this comment.
I think we should add a big long # comment here explaining the structure of this class and how the yield stuff is supposed to work
|
|
||
| self.__gamma = 0.66 | ||
|
|
||
| # range (0, 1) small factor used in initial guess of step size |
There was a problem hiding this comment.
There's quite a lot of properties declared down here. Seems this method has a lot of "state"!
- Are they all strictly necessary?
- And if so, would there be any sense in bundling them per subtask, e.g. some object for wolfe-related properties, something like that?
There was a problem hiding this comment.
I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?
| r, x, s, b, px = self.problem() | ||
| opt = pints.OptimisationController(r, x, method=method) | ||
| m = opt.optimiser() | ||
| _B = np.identity(m._n_parameters) |
| b = 1.0 | ||
|
|
||
| def temp(a=a, b=b, m=m): | ||
| generator = m._HagerZhang__bisect_or_secant(a=a, b=b) |
There was a problem hiding this comment.
Not sure what the best way to test these is. As a rule of thumb you don't test the private API. So we need to figure out if this is an exception or if there's something we're not seeing that would break this down into smaller testable blocks with public methods
|
Maybe we can link to this somewhere in the file. I can never find it :D |
ben18785
left a comment
There was a problem hiding this comment.
Looks really good @alisterde -- it's a beast of a method! My points are mainly surface level things as I'm finding following the logic tricky. As Michael says, having some description in the docstrings that describes the ask/telling would, I think, really help.
|
|
||
| class HagerZhang(pints.Optimiser): | ||
| """ | ||
| Gradient-descent method with a fixed learning rate. |
There was a problem hiding this comment.
Pseudocode too would I think be really useful. For pseudocode in the other methods, we tend not to incorporate ask/tell but rather lay out the algorithm as given in the paper (but made tidy and understandable for us).
|
|
||
| # As c1 approaches 0 and c2 approaches 1, the line search | ||
| # terminates more quickly. | ||
| self._c1 = 1E-4 # Parameter for Armijo condition rule, 0 < c1 < 0.5 |
There was a problem hiding this comment.
I couldn't find the Armijo rule in the paper? Could you point to where this is from? I'm guessing this is "delta" on page 184 of Hager and Zhang?
There was a problem hiding this comment.
Also, where the default value was taken from?
| # As c1 approaches 0 and c2 approaches 1, the line search | ||
| # terminates more quickly. | ||
| self._c1 = 1E-4 # Parameter for Armijo condition rule, 0 < c1 < 0.5 | ||
| self._c2 = 0.9 # Parameter for curvature condition rule, c1 < c2 < 1.0 |
There was a problem hiding this comment.
I'm guessing this is sigma on page 184 of the paper? Of course, we don't need to use the same parameter names as them but it might help in places to cross-check.
There was a problem hiding this comment.
Similarly as above for the default value.
| # range (0, 1), used in the ``self.__update()`` and | ||
| # ``self.__initial_bracket()`` when the potential intervals violate | ||
| # the opposite slope condition (see function definition) | ||
| self.theta = 0.5 |
There was a problem hiding this comment.
Again, would be good to know where the default value comes from here.
|
|
||
| self.__gamma = 0.66 | ||
|
|
||
| # range (0, 1) small factor used in initial guess of step size |
There was a problem hiding this comment.
I agree re: bundling. Could we perhaps bundle according to first and second Wolfe and approximate? Or would that not work?
| proposed_grad = np.matmul(np.transpose(proposed_dfdx), self.__px) | ||
|
|
||
| wolfe_curvature = (proposed_grad >= self._c2 * | ||
| np.matmul(np.transpose(self._current_dfdx), | ||
| self.__px)) | ||
|
|
||
| exact_wolfe_suff_dec = (self._c1 * | ||
| np.matmul(np.transpose(self._current_dfdx), | ||
| self.__px) | ||
| >= proposed_f - self._current_f) |
There was a problem hiding this comment.
np.matmul(np.transpose(self._current_dfdx), self.__px) seems to be being repeated three times here? I'd precalculate it and then reuse.
|
|
||
| # Checking if approximate wolfe conditions are meet. | ||
| apprx_wolfe_suff_dec = ((2.0 * self._c1 - 1.0) * | ||
| np.matmul(np.transpose(self._current_dfdx), |
| approximate_wolfe = (apprx_wolfe_suff_dec and wolfe_curvature | ||
| and apprx_wolfe_applies) | ||
|
|
||
| # If wolfe conditions are meet the line search is stopped. |
| # is only decreased in subsequent boundary manipulation. | ||
| return ps_2 * alpha_k0 | ||
|
|
||
| def __very_close(self, x, y): |
There was a problem hiding this comment.
is_very_close? That makes it clear that it returns a Boolean.
| suitable interval is found that contains the minimum. | ||
| ''' | ||
|
|
||
| # (steps B0 from [1]) |
There was a problem hiding this comment.
Perhaps I'm missing it but I can't see "B0" in the paper?
@MichaelClerx @ben18785
I've got an initial draft of the Hager-Zhang line search algorithm #1160 it is passing all of the tests I've written and appears to be functional.
However, I want to write more tests to check specific conditions in the logic before I'll be confident in the algorithm, I also think I need to add a few more convergence checks but I'll address that tomorrow.
I would appreciate any initial thoughts you have about it.