Experimenting with DQN using an adversarial approach
May 1, 2017
We recently turned our attention to the use of adversarial networks in solving reinforcement learning tasks. We have had our eyes on DeepMind’s innovative combination of convolutional neural networks, replay memory and experience replay. In particular, we used an adversarial network formulation to solve the reinforcement learning problem.
First and foremost, the aim of reinforcement learning is to train an agent to discover the policy that maximizes his performance with respect to a discounted future reward, while interacting within an environment. The agent takes actions in a set of possible actions based on a policy that maps each state to actions.
DeepMind showed they can efficiently learn to play a game without any specific information by using convolutional neural networks to solve a reinforcement learning task. These neural networks are traditionally used for image recognition as they can extract dense amounts of information from fine-grained data. Deepmind’s network recognizes features that yield valuable information to the agent such as location of different elements.
Neural networks have shown their ability to resolve reinforcement learning problems, however the learning process remains slow. Our scientists here at Instadeep have been working on reformulating the task of reinforcement learning with an adversarial network framework, inspired by recent advances in generative adversarial networks. To our great surprise, we discovered that introducing an adversarial network can greatly benefit the learning process. Preliminary validation tests show that our approach does indeed significantly accelerate the learning process as presented by DeepMind’s seminal paper.
The figures above are self-explanatory: it is clear that introducing an adversarial network is tremendously helpful in both accelerating the convergence and learning process, as well as stabilizing it. Moreover, the use of adversarial networks in a reinforcement learning context could prove extremely fruitful in future reinforcement learning tasks. Above all, we have only started scratching the surface in this area of research and we anticipate many more exciting projects and ideas to bloom in the near future.