I'm not familiar with lua but I'm assuming updateGradInput() is simply zeroing gradOutput wherever input is out of [-1,1] range. Shouldn't it also multiply gradOutput by input values in that range?
I can see that BinActive don't bother computing the K tensor as per the paper. Presumably because they didn't make a massive difference but are you also using a different estimator for derivative of the sign() function for binary activations than the one used for kernels (updateBinaryGradWeight)?