Research Papers

Leveraging State Space Models in Long Range Genomics

Matvei Popov | Aymen Kallala | Anirudha Ramesh | Narimane Hennouni | Shivesh Khaitan | Rick Gentry | Alain-Sam Cohen

ICLR LMRL (2025) May 2025
Comparison of the extrapolation methods of state-space models and attention-based models on VEP eQTLs (AUROC). For NTv2, we also reported an inference-time extrapolation method: position interpolation. A dotted vertical line indicates the fine-tuning sequence length (12 kbp) of all models. Attention-based models collapse when processing sequences that are longer than what they have encountered at training time, whereas state-space models show an ability to generalize to sequences up to 10x longer. Lines that turn into dotted indicate values that we were unable to compute due to computational cost constraints and are therefore assumed based on trends.

Open-Source and FAIR Research Software for Proteomics

Lukas Käll | Yasset Perez-Riverol | Wout Bittremieux | William S. Noble | Lennart Martens | Aivett Bilbao | Michael R. Lazear | Bjorn Grüning | Daniel S. Katz | Michael J. MacCoss | Chengxin Dai | Jimmy K. Eng | Robbin Bouwmeester | Michael R. Shortreed | Enrique Audain | Timo Sachsenberg | Jeroen Van Goey | Georg Wallmann | Bo Wen | William E. Fondrie

May 2025
Open-source software (OSS), aligned with the FAIR Principles (Findable, Accessible, Interoperable, Reusable), offers a solution by promoting transparency, reproducibility, and community-driven development, which fosters collaboration and continuous improvement. In this manuscript, we explore the role of OSS in computational proteomics, its alignment with FAIR principles, and its potential to address challenges related to licensing, distribution, and standardization.

AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks

Bora Guloglu | Miguel Bragança | Alex Graves | Scott Cameron | Timothy Atkinson | Liviu Copoiu | Alexandre Laterre | Thomas D. Barrett

May 2025

Metalic: Meta-Learning In-Context with Protein Language Models

Jacob Beck | Shikha Surana | Manus McAuliffe | Oliver Bent | Thomas D. Barrett | Juan Jose Garau Luis | Paul Duckworth

ICLR 2025 Apr 2025
Our method, called Metalic (Meta-Learning In-Context), uses in-context learning and fine-tuning, when data is available, to adapt to new tasks.

Simple Guidance Mechanisms for Discrete Diffusion Models

Hugo Dalla-Torre | Sam Boshar | Bernardo P. de Almeida | Thomas Pierrot | Yair Schiff | Subham Sekhar Sahoo | Hao Phung | Guanghan Wang | Alexander Rush | Volodymyr Kuleshov

ICLR 2025 Apr 2025
Guidance mechanisms for discrete diffusion

De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments

Kevin Eloff | Konstantinos Kalogeropoulos | Oliver Morell | Amandla Mabona | Jakob Berg Jespersen | Wesley WIlliams | Sam P. B. van Beljouw | Marcin Skwark | Andreas Hougaard Laustsen | Stan J. J. Brouns | Stan J. J. Brouns | Erwin M. Schoof | Jeroen Van Goey | Ulrich auf dem Keller | Karim Beguir | Nicolas Lopez Carranza | Timothy P. Jenkins

Nature Machine Intelligence Mar 2025

Bayesian Optimisation for Protein Sequence Design: Gaussian Processes with Zero-Shot Protein Language Model Prior Mean

Carolin Benjamins | Shikha Surana | Oliver Bent | Marius Lindauer | Paul Duckworth

NeurIPS 2024 workshop Dec 2024
Bayes Opt for Protein Design

BulkRNABert: Cancer prognosis from bulk RNA-seq based language models

Maxence Gélard | Guillaume Richard | Thomas Pierrot | Paul-Henry Cournède

ML4H 2024 Dec 2024
BulkRNABert pipeline. The 1st phase consists in pre-training the language model through masked language modeling using binned gene expressions. The 2nd phase fine-tunes a task-specific head using either cross-entropy for the classification task or a Cox-based loss for the survival task. IA3 rescaling is further added for the classification task.

BoostMD – Accelerating MD with MLIP

Lars L. Schaaf | Ilyes Batatia | Christoph Brunken | Thomas D. Barrett | Jules Tilly

NeurIPS 2024 workshop Dec 2024
Free energy surface of unseen alanine-dipeptide Comparison of the samples obtained by running ground truth MD and boostMD. The free energy of the Ramachandran plot, is directly related to the marginalized Boltzmann distribution exp [−F(ϕ, ψ)/kBT]. The reference model is evaluated every 10 steps. Both simulations are run for 5 ns (5 × 106 steps).

Learning the Language of Protein Structures

Benoit Gaujac | Jérémie Donà | Liviu Copoiu | Timothy Atkinson | Thomas Pierrot | Thomas D. Barrett

NeurIPS 2024 workshop Dec 2024
Schematic overview of our approach. The protein structure is first encoded as a graph to extract features from using a GNN. This embedding is then quantized before being fed to the decoder to estimate the positions of all backbone atoms.

Bayesian Optimisation for Protein Sequence Design: Back to Basics with Gaussian Process Surrogates

Carolin Benjamins | Shikha Surana | Oliver Bent | Marius Lindauer | Paul Duckworth

NeurIPS 2024 workshop Dec 2024
: Multi-round design averaged over eight single-mutant protein landscapes. Left: Top-30% recall (mean and 95%-CI). Our methods are highlighted with ∗ . Right: Wall-clock runtime interpreted across hardware as compute costs. Our GP with string (SSK) or fingerprint (Forbes) kernels are competitive with PLM baselines whilst only requiring a fraction of runtime and no pre-training.

Multi-modal Transfer Learning between Biological Foundation Models

Juan Jose Garau-Luis | Patrick Bordes | Liam Gonzalez | Masa Roller | Bernardo P. de Almeida | Lorenz Hexemer | Christopher Blum | Stefan Laurent | Jan Grzegorzewski | Maren Lang | Thomas Pierrot | Guillaume Richard

NeurIPS 2024 Dec 2024
We demonstrate IsoFormer’s capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same gene (i.e. same DNA sequence) and map to different transcription expression levels across various human tissues.
InstaDeep
Privacy Overview

Please read our extensive Privacy policy here. You can also read our Privacy Notice and our Cookie Notice