Skip to content

Reinforcement Learning

Jorge MF edited this page May 14, 2019 · 7 revisions

InfoBot: Transfer and exploration via the information bottleneck (Apr 2019)
Using information bottleneck (variational encoder) in agent to improve generalization and exploration of states.

Exploration by Random Network Distillation [code] (Oct 2018)
RND incentivizes visiting unfamiliar states by measuring how hard it is to predict the output of a fixed random neural network on visited states.

Self-Imitation Learning (Jun 2018)
Self-imitation learning improves actor-critic (A2C) agents to explore deeper.

Imagination-Augmented Agents for Deep Reinforcement Learning (Feb 2018)
Imagination-Augmented Agents (I2As) learns to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks.

Proximal Policy Optimization Algorithms [code] (Jul 2017)

Clone this wiki locally