You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 11, 2021. It is now read-only.
# Keeps track of the "expected reward" at each timestep.
expected_reward = tf.Variable(tf.zeros((SEQUENCE_MAXLEN,)))
reward = d_preds - expected_reward[:tf.shape(d_preds)[1]] # minus zeros??
mean_reward = tf.reduce_mean(reward)
# This variable is updated to know the "expected reward". This means
# that only results that do surprisingly well are "kept" and used
# to update the generator.
exp_reward_loss = tf.reduce_mean(tf.abs(reward))
exp_op = reward_opt.minimize(
exp_reward_loss, var_list=[expected_reward]) # why update an irrelevant variable??