Wrong implementation of hyper_v2 mix_layers

Hi,
In VeLO (https://arxiv.org/pdf/2211.09760.pdf) Section B.3, it states that mixing is done by F0(x) + max(σ(F1(σ(F2(x)))), axis = 0, keep_dims = True).
However, in the implementation of hyper_v2 (https://github.com/google/learned_optimization/blob/main/learned_optimization/research/general_lopt/hyper_v2.py#L330-L335), it essentially use only one linear layer instead of two, as the input to second linear layer is x instead of mix_layer (L332).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong implementation of hyper_v2 mix_layers #313

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong implementation of hyper_v2 mix_layers #313

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions