Preferential Bayesian Optimisation for Protein Design with Fine-Tuned Protein Language Model Ensembles

ABSTRACT

It has recently been observed that the use of ranking-based loss functions improves the quality of predictions of fitness landscapes for both standard supervised deep learning models and fine-tuned protein language models. We consider the implications of this finding for protein design with Bayesian optimisation. We investigate a range of uncertainty quantification techniques applicable to protein language models fine-tuned with ranking losses, showing that they can offer competitive calibration to CNN ensembles while demonstrating superior predictive performance. Finally, we offer a demonstration of how uncertainty-aware ranking-based models can be exploited for protein design within the framework of preferential Bayesian Optimisation.

Published

Share