Skip to content

Keras implementation of "data_based_init()" assumes Activation functions are put in a separate layer #3

@redst4r

Description

@redst4r

Hi,

as far as I understand weightnorm-initialization, you calculate the mean/std of the 'preactivations' (without the nonlinearity applied) and use these to initialize the weights/biases.

In weightnorm.data_based_init(), it is implicitly assumed that the nonlinearity is applied in a separate layer, since we collect all layers which have a W and b attribute, and use their output to caclulate the mean/std of the preactivations:

    layer_output_weight_bias = []
    for l in model.layers:
        if hasattr(l, 'W') and hasattr(l, 'b'):
            assert(l.built)
            layer_output_weight_bias.append( (l.name,l.get_output_at(0),l.W,l.b)
...
    for l,o,W,b in layer_output_weight_bias:
        print('Performing data dependent initialization for layer ' + l)
        m,v = tf.nn.moments(o, [i for i in range(len(o.get_shape())-1)])
        s = tf.sqrt(v + 1e-10)

However, if the layer has a nonlinearity built into it, e.g via

fc = Dense(output_dim=50, activation='relu')

the above approach will pick the 'postactivations' after the Relu in .get_output_at(0) and calculate the mean/std of the postactivations to rescale W and b, which is technically not correct I think.

Unfortunately, i dont know a straight workaround (except forcing nonlinearities as separate layers); no idea how to get the preactivations from such a layer that internally applies the nonlinearity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions