A Pytorch implementation of Deep Determinisitc Policy Gradient for simple continuous control tasks.
More info: Continuous control with deep reinforcement learning
Watching a pretrained agent on Pendulum-v0:
python run.py --env Pendulum-v0 --agent saves/pretrained_pendulum --episodes 10or on MountainCarContinuous-v0:
python run.py --env MountainCarContinuous-v0 --agent saves/pretrained_mountaincar --episodes 10python ddpg.pyThere are a ton of CL flags. See the bottom of ddpg.py for a full list, but here are the important ones:
--envis the gym environment id. Options are MountainCarContinuous-v0 and Pendulum-v0--num_episodesis how many episodes of experience to collect during training. Defaults to 500.--batch_sizeis how many sample transitions are passed through the networks at once during training. Defaults to 128. This may need to be reduced when running on CPUs.--renderis either1or0.1lets you watch the agent as it learns. This slows the process down.
