Each year, the Genetic and Evolutionary Computation Conference (GECCO) gathers the leading global experts in the domain of genetic and evolutionary computing. This year, InstaDeep’s research team presented two main conference papers and one workshop paper at the event, which took place in Boston, MA between 9-13 July. The three works result from a close and prolific collaboration with Imperial College London’s Adaptive & Intelligent Robotic Lab (AIRL).
Quality Diversity & Neuro-Evolution
Quality-Diversity algorithms historically rely on random mutations to explore small search spaces but struggle when facing higher-dimensional problems. As a result, they often scale poorly to problems where neural networks with many parameters provide state-of-the-art results.
Building large and efficient controllers that work with continuous actions has been a long-standing goal in Artificial Intelligence and in particular in robotics. Deep reinforcement learning (RL) – and especially Policy Gradient (PG) methods – have proven efficient at training such large controllers. One of the keys to this success lies in the fact that PG methods exploit the structure of the objective function when the problem can be formalized as a Markov Decision Process (MDP), leading to substantial gains in sample efficiency. Moreover, they also exploit the analytical structure of the controller when known, which allows the sample complexity of these methods to be independent of the parameter space dimensionality.
In real-world applications, these gains turn out to be critical when interacting with the environment is expensive. Although exploration is very important to reach optimal policies, PG methods usually rely on simple exploration mechanisms, like adding Gaussian noise or maximizing entropy to explore the action space, which happens to be insufficient in hard exploration tasks where the reward signal is sparse or deceptive.
Successful attempts have been made to combine evolutionary methods and RL to improve exploration. However, all these techniques only focus on building high-performing solutions and do not explicitly encourage diversity within the population. In this regard, they fail when confronted with hard exploration problems. More recently, Policy Gradient Assisted MAP-Elites (PGA-ME) was proposed to bring reinforcement learning in map-elites to train neural networks to solve locomotion tasks with diverse behaviors. Although using policy gradient for performance, the diversity seeking mechanism is still based on a divergent genetic search, making it struggle on hard exploration tasks.
To solve this challenge, we introduce the notion of diversity policy gradient that thrive the policies in the population towards diversity. This mechanism allows for fast and efficient diversity search even in high dimension. We call the resulting algorithm Quality-Diversity Policy Gradient (QDPG). QDPG alternates policy gradients updates to improve both the returns of each policy and the diversity within the population. As a result, QDPG outperforms his evolutionary competitors in terms of sample efficiency and of behavior space coverage capacity.
Multi Criteria Quality Diversity
Multi-objective optimization is a very popular and useful field of research and application. Indeed, most applications involve multiple conflicting objectives and it is key to have a view of the different possible trade-offs to make a decision. Multi-criteria optimization is omnipresent in industrial design problems. For instance, designing a protein that binds towards an identified receptor to design a new drug. Maximizing the binding will not be enough, we also need to make sure that the protein will not be toxic, will remain stable, will not be targeted by the immune system and so on. While it is possible to combine the objectives into one, multi-objective approaches look for a set of Pareto-optimal solutions, i.e. none of which can be considered better than another when every objective is of importance.
Traditional multi-objective optimization methods are designed to generate a set of solutions that approximate the set of all optimal trade-offs, called the Pareto Front. Evolutionary algorithms are a natural approach and numerous methods have been proposed, differing on their underlying selection scheme: Non-dominated Sorted Genetic Algorithm (NSGA-II), Strength Pareto Evolutionary Algorithm (SPEA2) or multi-objective evolutionary algorithm based on decomposition (MOEA/D) are examples of such approaches.
However, to the best of our knowledge, no method explicitly tackles the issue of Multi-Objective Quality Diversity. We propose to design a novel method for Multi-Objective Quality Diversity (MOQD) optimization. Namely, we introduce Multi-Objective MAP-Elites (MOME) which divides the descriptor space in niches with a tessellation method and illuminates each cell of the tessellation by filling it with its set of Pareto optimal. Our experimental evaluation shows the ability of MOME to evolve collections of diverse solutions while providing global performances similar to standard multi-objective algorithms. We believe that combining MOME’s abilities to optimize several objectives at the same time while thriving for diversity can bring impact in many real world applications.
InstaDeep and AIRL shared research focuses on improving Quality-Diversity (QD) algorithms. A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single efficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered making it well-suited for real industrial applications. Quality-Diversity methods are evolutionary algorithms designed for this purpose.
Collaboration with Imperial College London
These works were developed in close collaboration with Prof. Antoine Cully from Imperial College London and his department, the Adaptive and Intelligent Robots Lab (AIRL).
The AIRL lab is a team of 15 Postdoctoral, Ph.D., and MSc researchers. The AIRL group has significant expertise in the use of machine learning to improve the resilience and robustness of robotics systems to unforeseen situations. In particular, the lab has a recognized world-wide expertise in Quality-Diversity Optimisation.
In addition to our recent publications, InstaDeep and AIRL have co-developed and open-sourced the QDax library, a tool to accelerate Quality-Diversity (QD) and neuro-evolution research through hardware accelerators and massive parallelization.
“Quality-Diversity methods have recently experienced a surge of interest due to their exceptional abilities to find diverse ways to solve complex problems. We believe they will become a critical component in modern decision-making systems. We are very excited to collaborate with the AIRL from Imperial College London, a world leader in this field. This partnership will help us in our mission of accelerating scientific progress and developing cutting-edge technologies for practical applications in logistics, supply chain, hardware design, bioinformatics and more”, said Alex Laterre, Research Lead at InstaDeep.
Antoine Cully, Director of AIRL, continued, "I am very excited to see more and more applications for Quality-Diversity optimisation algorithms that are spawning both in academia and industrial applications. I strongly believe that their ability to produce large collections of diverse and high-performing solutions is an outstanding asset in many situations, and I am looking forward to exploring these new opportunities with InstaDeep!”
InstaDeep is hiring! If you are interested in QDax, Quality Diversity or any of the related topics in this post, please check out our exciting research opportunities at www.instadeep.com/careers