Research Papers

Metalic: Meta-Learning In-Context with Protein Language Models

Jacob Beck | Shikha Surana | Manus McAuliffe | Oliver Bent | Thomas D. Barrett | Juan Jose Garau Luis | Paul Duckworth

ICLR 2025 Apr 2025
Our method, called Metalic (Meta-Learning In-Context), uses in-context learning and fine-tuning, when data is available, to adapt to new tasks.

Simple Guidance Mechanisms for Discrete Diffusion Models

Hugo Dalla-Torre | Sam Boshar | Bernardo P. de Almeida | Thomas Pierrot | Yair Schiff | Subham Sekhar Sahoo | Hao Phung | Guanghan Wang | Alexander Rush | Volodymyr Kuleshov

ICLR 2025 Apr 2025
Guidance mechanisms for discrete diffusion

De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments

Kevin Eloff | Konstantinos Kalogeropoulos | Oliver Morell | Amandla Mabona | Jakob Berg Jespersen | Wesley WIlliams | Sam P. B. van Beljouw | Marcin Skwark | Andreas Hougaard Laustsen | Stan J. J. Brouns | Stan J. J. Brouns | Erwin M. Schoof | Jeroen Van Goey | Ulrich auf dem Keller | Karim Beguir | Nicolas Lopez Carranza | Timothy P. Jenkins

Nature Machine Intelligence Mar 2025

Bayesian Optimisation for Protein Sequence Design: Gaussian Processes with Zero-Shot Protein Language Model Prior Mean

Carolin Benjamins | Shikha Surana | Oliver Bent | Marius Lindauer | Paul Duckworth

NeurIPS 2024 workshop Dec 2024
Bayes Opt for Protein Design

BulkRNABert: Cancer prognosis from bulk RNA-seq based language models

Maxence Gélard | Guillaume Richard | Thomas Pierrot | Paul-Henry Cournède

ML4H 2024 Dec 2024
BulkRNABert pipeline. The 1st phase consists in pre-training the language model through masked language modeling using binned gene expressions. The 2nd phase fine-tunes a task-specific head using either cross-entropy for the classification task or a Cox-based loss for the survival task. IA3 rescaling is further added for the classification task.

BoostMD – Accelerating MD with MLIP

Lars L. Schaaf | Ilyes Batatia | Christoph Brunken | Thomas D. Barrett | Jules Tilly

NeurIPS 2024 workshop Dec 2024
Free energy surface of unseen alanine-dipeptide Comparison of the samples obtained by running ground truth MD and boostMD. The free energy of the Ramachandran plot, is directly related to the marginalized Boltzmann distribution exp [−F(ϕ, ψ)/kBT]. The reference model is evaluated every 10 steps. Both simulations are run for 5 ns (5 × 106 steps).

Learning the Language of Protein Structures

Benoit Gaujac | Jérémie Donà | Liviu Copoiu | Timothy Atkinson | Thomas Pierrot | Thomas D. Barrett

NeurIPS 2024 workshop Dec 2024
Schematic overview of our approach. The protein structure is first encoded as a graph to extract features from using a GNN. This embedding is then quantized before being fed to the decoder to estimate the positions of all backbone atoms.

Bayesian Optimisation for Protein Sequence Design: Back to Basics with Gaussian Process Surrogates

Carolin Benjamins | Shikha Surana | Oliver Bent | Marius Lindauer | Paul Duckworth

NeurIPS 2024 workshop Dec 2024
: Multi-round design averaged over eight single-mutant protein landscapes. Left: Top-30% recall (mean and 95%-CI). Our methods are highlighted with ∗ . Right: Wall-clock runtime interpreted across hardware as compute costs. Our GP with string (SSK) or fingerprint (Forbes) kernels are competitive with PLM baselines whilst only requiring a fraction of runtime and no pre-training.

Multi-modal Transfer Learning between Biological Foundation Models

Juan Jose Garau-Luis | Patrick Bordes | Liam Gonzalez | Masa Roller | Bernardo P. de Almeida | Lorenz Hexemer | Christopher Blum | Stefan Laurent | Jan Grzegorzewski | Maren Lang | Thomas Pierrot | Guillaume Richard

NeurIPS 2024 Dec 2024
We demonstrate IsoFormer’s capabilities by applying it to the largely unsolved problem of predicting how multiple RNA transcript isoforms originate from the same gene (i.e. same DNA sequence) and map to different transcription expression levels across various human tissues.

Dispelling the Mirage of Progress in Offline MARL

Claude Formanek | Callum Rhys Tilbury | Louise Beyers | Jonathan Shock | Arnu Pretorius

NeurIPS 2024 Dec 2024
We compare our baseline implementations to the reported performance of various algorithms from the literature across a wide range of datasets. We normalise results from each dataset (i.e. scenario-quality-source combination) by the SOTA performance from the literature for that dataset. Standard deviation bars are given and when our baseline is significantly better or equal to the best method, using a two-side t-test, we indicate so using a gold star. We find that on 35 out of the 47 datasets tested (almost 75% of cases), we match or surpass the performance of the current SOTA.

SPO: Sequential Policy Optimisation

Matthew V Macfarlane | Edan Toledo | Donal Byrne | Paul Duckworth | Alexandre Laterre

NeurIPS 2024 Dec 2024
Model-based planning algorithm for both continuous and discrete sequential decision making problems

Nucleotide Transformer: building and evaluating robust foundation models for human genomics

Hugo Dalla-Torre | Liam Gonzalez | Javier Mendoza-Revilla | Nicolas Lopez Carranza | Adam Henryk Grzywaczewski | Francesco Oteri | Christian Dallago | Evan Trop | Bernardo P. de Almeida | Hassan Sirelkhatim | Guillaume Richard | Marcin Skwark | Karim Beguir | Marie Lopez  | Thomas Pierrot

Nature Methods 2024 Nov 2024
Left: Graphical representation of genomic features considered for downstream tasks to evaluate NT performance. Right: Comparison of NTs to baselines. We report Normalized mean of MCC performance across downstream tasks (divided by category) for all methods after fine-tuning.
InstaDeep
Privacy Overview

Please read our extensive Privacy policy here. You can also read our Privacy Notice and our Cookie Notice