I am just trying to learn reinforcement learning
THe initial state is at a random position from limx->0+ to limx->10-
My state includes the car's position from the left wall and the angle of movement THe action is to change the car's angle
THe linear model's output is the mean and log(std) from which I sample from
The end is after 100 sucessful steps