Early Computational Detection of Potential High Risk SARS-CoV-2 Variants

Karim Beguir | Marcin J Skwark | Yunguan Fu | Thomas Pierrot | Santiago Nicolas Lopez Carranza | Alexandre Laterre | Ibtissem Kadri | Bonny Gaby Lui 1 | Bianca Sanger 1 | Yunpeng Liu 1 | Asaf Poran 1 | Alexander Muik 1 | Ugur Sahin

1 BioNTech



The ongoing COVID-19 pandemic is leading to the discovery of hundreds of novel SARS-CoV-2 variants daily. While most variants do not impact the course of the pandemic, some variants pose an increased risk when the acquired mutations allow better evasion of antibody neutralisation or increased transmissibility. Early detection of such high-risk variants (HRVs) is paramount for the proper management of the pandemic. However, experimental assays to determine immune evasion and transmissibility characteristics of new variants are resource-intensive and time-consuming, potentially leading to delays in appropriate responses by decision makers. Presented herein is a novel in silico approach combining spike (S) protein structure modelling and large protein transformer language models on S protein sequences to accurately rank SARS-CoV-2 variants for immune escape and fitness potential. Both metrics were experimentally validated using in vitro pseudovirus-based neutralisation test and binding assays and were subsequently combined to explore the changing landscape of the pandemic and to create an automated Early Warning System (EWS) capable of evaluating new variants in minutes and risk-monitoring variant lineages in near real-time. The system accurately pinpoints the putatively dangerous variants by selecting on average less than 0.3% of the novel variants each week. The EWS flagged all 16 variants designated by the World Health Organization (WHO) as variants of interest (VOIs) if applicable or variants of concern (VOCs) otherwise with an average lead time of more than one and a half months ahead of their designation as such.