not really understand the effect of "exp_op" in the generator loss

            # Keeps track of the "expected reward" at each timestep.
            expected_reward = tf.Variable(tf.zeros((SEQUENCE_MAXLEN,)))
            reward = d_preds - expected_reward[:tf.shape(d_preds)[1]]  # minus zeros??
            mean_reward = tf.reduce_mean(reward)

            # This variable is updated to know the "expected reward". This means
            # that only results that do surprisingly well are "kept" and used
            # to update the generator.
            exp_reward_loss = tf.reduce_mean(tf.abs(reward))
            exp_op = reward_opt.minimize(
                exp_reward_loss, var_list=[expected_reward])  # why update an irrelevant variable??

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not really understand the effect of "exp_op" in the generator loss #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

not really understand the effect of "exp_op" in the generator loss #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions