Deep Deterministic Policy Gradient || PyTorch || OpenAI Gym
Lorenzo Soligo, Ca' Foscari University of Venice. Project for the Artificial Intelligence: Machine Learning and Pattern Recognition course.
- Create the Conda environment:
conda env create -f environment.yml - Activate the environment:
conda activate deeprl - Install the requirements from pip:
pip install -r requirements.txt
python main.py
You can use the following flags:
--eval: will run an episode using an already saved model of the actor. Don't use this if you want to train the model.--env: name of the OpenAI Gym environment to use. The default isLunarLanderContinuous-v2. Notice that DDPG is developed to be used with continuous action spaces.
- Copy the
modelsfolder fromresults/lunarlanderinto the root of the project. - Run
python main.py --evalto test LunarLander, orpython main.py --eval --env "AnotherEnv"to test another environment- beware that only
LunarLanderis provided.
- beware that only
- This implementation does not precisely follow the one presented in the paper. As a matter of fact, I noticed that not using batch normalization and adding the actions in the critic's input layer drastically improved performance.
- The
resultsfolder contains videos, Tensorboard logs and working models for LunarLander