Asynchronous Advantage Actor Critic implementation using Tensorflow for OpenAI-Gym (ATARI) environments.
Based on the A3C paper
- Python 3
- OpenAI-Gym
- Tensorflow
- scipy
- numpy
You can run the code simply by using:
python a3c.py
To change the environment and the number of threads just change the following line:
def main():
a3c("Breakout-v0", num_threads=8 )
This implementation takes an image input and has an output for a discrete action space. All ATARI2600 games from OpenAI-Gym should work. If you'd like to use a continuous input (not images), you'd have to change the first two layers of the model in agent.py and the preprocessing for the observation in custom_gym.py.
In order to change learning parameters go to these lines:
T_MAX = 10000000 # maximum steps
LEARNING_RATE = 1e-4
DISCOUNT_FACTOR = 0.99
TEST_EVERY = 30000
I_ASYNC_UPDATE = 5 #horizon for an update
For learning purposes I encourage you to test different parameters and compare performances.
Unfortunately most comments are in Portuguese-BR, I'll be working on translating them as soon as possible.
- Translate comments
- Continuous action
- LSTM
Theoretical explaination on Policy Gradients
Pieter Abbeel's lecture on policy gradients
Chris Nicholls' very simple A3C tutorial - A lot of my code is based on this.
Morvan Zhou's implementation of A3C - I suggest you have a basic understanding of the A3C before diving into his code.