If I run the TensorFlow version of this code (tf_train.py) with #8 applied, I get a NaN within the first few iterations and training stops. If I remove that change, training proceeds fine. @pukkapies were you ever able to get the model training appropriately with your changes applied? If so, what hyperparameter settings were you using?