I am studying how you can achieve walking and standing in one policy. When I study your code, I find that in some cases the velocity might be zero due to "set small commands to zero” but the standing mask might be false. However, the standing mask does not belong to the observations. This means that when the robot receives zero velocity command, its reward will be very different between true standing mask and false standing mask even uses the same actions. Is this a problem when we train the robot? Thanks.