An early detection system for desert locust outbreaks in Africa, in collaboration with Google AI



Our collaboration with Google AI, “On pseudo-absence generation and machine learning for locust breeding ground prediction in Africa”, describes an early detection system for desert locust outbreaks across the African continent. This research will be presented at two NeurIPS 2021 machine learning conference workshops: Artificial Intelligence for Humanitarian Assistance and Disaster Response (AI+HADR) and Machine Learning for the Developing World (ML4D)

Desert Locusts, a grave threat to human and animal food security in Africa

Desert locust outbreaks threaten the food security of large swathes of Africa and endanger the livelihoods of millions of people. An upsurge in their population between 2019-2021 was prompted by unusual patterns of rainfalls, cyclones and monsoons. Two cyclones in May and October 2018 brought heavy rainfall to the Empty Quarter on the Arabian Peninsula and created a perfect condition for desert locust eggs to hatch, develop and breed. Billions of locusts migrating from these regions spread to the Horn of Africa and to the Indo-Pakistan region, causing a severe locust invasion in 2019-2020. 

Locust invasions have long been known to cause serious human and animal food security issues as they feed on green vegetation along their migration routes. Swarms containing up to 192,000,000,000 locusts – covering an area three times the size of New York city – were spotted in Kenya in 2020. A locust swarm of this size is estimated to consume the same as 90,000,000 people in just one day. Other research estimates that in 2019 alone, a combined 260,000 hectares of farm and cropland in Kenya, Somalia and Ethiopia were devastated by locusts. 

2020 BBC World Service report on East Africa’s biggest locust swarms in 70 years

Predicting locust breeding grounds across Africa using machine learning

We set out to use machine learning (ML) for locust distribution modelling which could assist in early warning of a locust invasion by predicting potential locust breeding ground locations ahead of time. The Food and Agricultural Organization (FAO) of the United Nations has been observing desert locusts since 1985 and maintains a database containing recordings of locust sightings at different stages of their lifecycle. This type of dataset containing only confirmed sightings is referred to as a presence-only dataset. To train an ML system, there is a need to complement this presence data with absence data. The problem with a presence-only dataset is well known in the domain of species distribution modeling, and different techniques have been proposed for generating pseudo-absence data. However, most work on locust breeding ground prediction has resorted to using random sampling. 

As an initial step in assisting early warning for locust outbreaks, we tested several methods for pseudo-absence generation to identify which method is most accurate in detecting locust breeding grounds. The methods tested include random sampling (RS), random sampling with environmental profiling (RSEP), random sampling with extent limitation (RS+) and random sampling with environmental profiling and extent limitation (RSEP+). 

Pseudo-absence generation methods: Top row provides an illustration of the method and the bottom row shows an example of the corresponding method on a subset of African countries Niger, Mauritania, Mali, Algeria, Western Sahara and Morocco for November 2003. (a) Random sampling (RS). (b) Random sampling with environment profiling (RSEP). White map regions indicate environmentally suitable regions as identified through environmental profiling, i.e. these are the regions where pseudo-absence points should not be sampled. (c) Random sampling with background extent limitation (RS+). (d) Random sampling with environment profiling and background extent limitation (RSEP+).
Performance comparison between generation methods and ML algorithms. Bold values indicate top performance across generation methods for a specific algorithm. Both in terms of Accuracy and F1 score, the logistic regression model significantly outperformed the other approaches and achieved the best performance overall when trained in conjunction with RSEP.

We combined the locust observation data from FAO and output of pseudo-absence generation methods for the entire African continent. This gave us a dataset of geo-locations referencing locust presence and absence points. To make this geo-location dataset useful, we enriched it with environmental and climatic data from NASA GLDAS and ISRIC SoilGrids. We found that random sampling with environmental profiling (RSEP) is the best performing approach when used for training a logistic regression model, as shown in Table 1. This result is one of the first to be derived in the context of the entire African continent, allowing the tested models to generalize across many countries. Sella Nevo, head of Google’s Karmel group using AI for social impact, said of the results, “this is a great first step towards high quality, scalable locust modeling”.

Full details regarding performance comparison, statistical tests and interpretation of the results are shown in the paper.

ML, from Africa, for Africa

This work is a collaboration between InstaDeep’s offices in Nigeria and South Africa, and Google AI in the Middle East and Africa.  “It is important that Africans take ownership of key technologies and harness them to solve the challenges we have on the continent. To not be left out of the conversation regarding what the best solutions are, but rather to be at the center of it, crafting and shaping those solutions. Two years ago I was offered a scholarship by Google and Meta for the African Masters in Machine Intelligence (AMMI) program. I am happy that the knowledge I acquired, and the opportunity I have at InstaDeep, is helping to solve challenges in Africa and beyond using AI”, says Ibrahim Salihu Yusuf, lead author on the paper. 

InstaDeep Co-Founder and CEO Karim Beguir added that “our mission as a company is to accelerate the transition to an AI-First world that benefits everyone. It is clearly the case with this joint research work with Google, focused on a real-life challenge impacting many countries in Africa, and we will continue to innovate in this direction.

We view this work as only the beginning, an initial research contribution situated inside a much wider context. Dr Arnu Pretorious, InstaDeep’s research team lead in South Africa, noted that “a problem of this magnitude and scale includes many stakeholders and our hope is that in the next phase of this project we can form key partnerships with organisations and institutions involved in alleviating food insecurity across Africa. It is important that our research and the technology we develop are informed by the right people, so that it may be operationalised in the right way and end up providing real value”. 


This work was done by a team from InstaDeep: Ibrahim Salihu Yusuf, Kale-ab Tessera, Tom Tumiel and Arnu Pretorius, in close collaboration with Google AI, in particular working with Sella Nevo. We also want to greatly acknowledge input from Krishna Sapkota and Abubakr Babiker from Google AI. Finally, we want to thank the FAO for the incredible work they have been doing over many years on collecting data, working on the ground and advancing the state-of-the-art in early warning systems for locusts.