Research

InstaDeep’s in-house research team contributes to the latest advancements in AI – from the fundamentals of machine learning to robotics and deep reinforcement learning. Our published research has been presented at world-leading conferences such as NeurIPS 2018.

Towards Compositionality in Deep Reinforcement Learning

Authors:

T. Pierrot, G. Ligner, S. Reed, O. Sigaud, N. Perrin, A. Laterre, D. Kas, K. Beguir, N. de Freitas

DATE:

June 18, 2019

Learning a wide variety of skills that can be reused to learn more complex skills or solve new problems, is one of the central challenges of AI. In this research project in collaboration with Scott Reed and Nando de Freitas, we worked on answering the following question: How can knowledge acquired in one setting be transferred to other settings? A powerful way to achieve this goal is through modularity. Imagine you want to teach a robot how to cook your favourite meal. First, you have to teach it how to chop vegetables, how to fry them, etc. Secondly, to chop vegetables, the robot must learn how to use a knife, how to use a chopping board, how to properly clean your vegetables, and so on. This paper aims to exploit modularity to learn to solve complex problems with neural networks by decomposing them into simpler ones to be solved first.

GITHUB CODE FULL PAPER Read More

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Authors:

A. Laterre,  Y. Fu,  M. K. Jabri,  A-S. Cohen,  D. Kas,  K. Hajjar,  T. S. Dahl,  A. Kerkeni,  K. Beguir

DATE:

March 17, 2019

Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional and three-dimensional bin packing problems show that it outperforms generic Monte Carlo tree search, heuristic algorithms and integer programming solvers. We also present an analysis of the ranked reward mechanism, in particular, the effects of problem instances with varying difficulty and different ranking thresholds.

Read More

Explicit Sequence Proximity Models for Hidden State Identification

Authors:

Anil Kota, Sharath Chandra, Parag Khanna

| Torbjørn S. Dahl

DATE:

March 17, 2019

Sequence similarity is a critical concept for comparing short- and long-term memory in order to identify hidden states in partially observable Markov decision processes. While connectionist algorithms can learn a range of ad-hoc proximity functions, they do not reveal insights and generic principles that could improve overall algorithm efficiency.

Our work uses the instance-based Nearest Sequence Memory (NSM) algorithm as a basis for exploring different explicit sequence proximity models including the original NSM proximity model and two new models, temporally discounted proximity and Laplacian proximity. The models were compared using three benchmark problems, two discrete grid world problems and one continuous space navigation problem. The results show that more forgiving proximity models perform better than stricter models and that the difference between the models is more pronounced in the continuous navigation problem than in the discrete grid world problems.

Read More