Flexible antibody design with AbBFN2

Flexible antibody design with AbBFN2

Published

Categories

Disclaimer: All claims made are supported by our research paper: A Flexible Antibody Foundation Model Based on Bayesian Flow Networks unless explicitly cited otherwise.

Antibodies play an important role in the adaptive immune response. By selectively recognising and binding to specific antigens—such as viruses or bacteria—they neutralise threats and are essential to both acute and long-term immunity.

This ability to target a wide range of molecules has made antibodies indispensable to therapeutic development, fueling a market valued at $252.6 billion in 2024, with projections reaching $0.5 trillion by 2029. 

Therapeutic antibodies consistently make up a major share of new clinical trials. Since 2020, at least 12 antibody therapies have entered the US or EU market every year. Yet, despite their success, antibody development remains an intricate and resource-intensive process—one that AI-driven approaches aim to transform.

Why is antibody development so challenging?

Even when considering only germline antibodies, the estimated number of possible sequences is enormous, ranging between 10 billion and 100 billion—a formidable landscape to navigate.

Adding to this complexity is the fact that not all antibodies are created equal. To be effective for therapeutic use, an antibody must bind precisely to its target while avoiding unintended interactions, and be free of liabilities that could hinder its clinical viability, such as aggregation propensity, poor stability, or low expression levels.

Engineering a therapeutic antibody is therefore a multi-objective optimisation process, typically beginning with the identification of an initial binder—a sequence that shows potential in attaching to a specific target. 

However, this sequence is rarely perfect from the start and traditionally undergoes refinement through a case-specific, iterative pipeline 1 of computational and laboratory-based approaches, aiming to improve it through a series of independent steps. For example, in early candidate development a scientist may have an antibody that looks promising for therapeutics, but does not appear to be compatible in the human body. The antibody is improved through a process called humanisation to minimise the risk of an unwanted immune response, while ensuring its desirable properties remain intact. 

Advancements in AI are playing an increasingly significant role in therapeutic antibody optimisation, however typical models focus on either sequence generation or property prediction—rarely both. Since these methods traditionally operate in isolation and rely on different tools for each step, optimising one property (such as humanisation) often comes at the expense of another (such as stability), resulting in inefficiencies and trade-offs.

We believe that by unifying these tasks into a single generative framework, AbBFN2 has the potential to enhance efficiency for scientists who want to optimise multiple antibody properties simultaneously.

What is AbBFN2?

Built on the  Bayesian Flow Network (BFN) paradigm, AbBFN2 extends our ProtBFN model into a multimodal framework, delivering a flexible tool for antibody design and optimisation.

Through extensive training on diverse antibody sequences, AbBFN2 models 45 different modalities, enabling it to streamline multiple tasks simultaneously while maintaining high accuracy. Unlike conventional approaches that require retraining to accommodate new tasks, AbBFN2 jointly models sequence, genetic, and biophysical attributes within a unified generative framework. 

The model can adapt to user-defined tasks by conditionally generating any subset of attributes when given values for the others—enabling a steerable and flexible approach to antibody design. This provides users with a framework for defining and achieving a wide range of optimisation goals. When combined with domain expertise, AbBFN2’s interactive capabilities have the potential to enhance development campaigns and accelerate antibody discovery.

Figure 1: Illustration of AbBFN2’s holistic understanding of an antibody. Demonstration of  how AbBFN2 learns the genetic and biophysical attributes of an antibody, as well as sequence. Source: Internal
Figure 1: Illustration of AbBFN2’s holistic understanding of an antibody. Demonstration of how AbBFN2 learns the genetic and biophysical attributes of an antibody, as well as sequence. Source: Internal

What can it do?

To evaluate AbBFN2’s real-world performance, we tested the model across four key use-cases in therapeutic antibody development—demonstrating its accuracy, and accessibility across critical design tasks. 

Unconditional generation
To determine whether AbBFN2 has internalised the fundamental rules governing antibody sequence space, we assessed its ability to generate antibodies unconditionally—without task-specific prompting or conditioning.

We sampled sequences from the model, alongside their associated metadata, and compared them to natural antibodies across a range of biologically meaningful metrics. These included properties of the complementarity-determining regions (CDRs)—the key segments of an antibody responsible for antigen binding.

AbBFN2-generated sequences closely matched the natural distribution of CDR loop lengths (Figure 2a), amino acid propensities, and sequence diversity. Notably, when these sequences were folded using ImmuneBuilder, they aligned with structural motifs observed in natural antibodies—despite not being explicitly trained on structural information, and evaluated only on held-out antibody sequences (Figure 2b, Figure 2c).

Beyond the CDRs, AbBFN2 also recapitulated the genetic landscape of real antibodies. It faithfully captured conditional distributions, involving gene pairing propensities within and between the variable heavy (VH) and variable light (VL) chain domains, implying a nuanced understanding of antibody generation. To validate this further, we tested for training data memorisation and confirmed that the model generates sequences distinct from those seen during training—affirming its ability to reflect the real-world statistical makeup of antibody sequences.

Figure 2: Unconditional generation with AbBFN2. (From left to right) We present here the length (a)  and structure (b) of distributions of CDR-L3 of generated (navy) and natural (green) antibodies. (Ct) the samples shown here (purple) correspond to the regions marked by red dots (b).  Source: Internal
Figure 2: Unconditional generation with AbBFN2. (From left to right) We present here the length (a)  and structure (b) of distributions of CDR-L3 of generated (navy) and natural (green) antibodies. (Ct) the samples shown here (purple) correspond to the regions marked by red dots (b).  Source: Internal

Sequence annotation
Next, we considered AbBFN2’s ability to infer key attributes from sequence alone—sequence annotation. In this setting, the model receives an antibody sequence and is asked to predict all associated metadata, including species, gene usage, and developability metrics.

AbBFN2 performs well across all evaluated tasks, consistently matching existing tools. Notably, the model accurately predicted Therapeutic Antibody Profiler (TAP) flags—development liabilities—directly from sequence, a task that is particularly challenging when only sequence information is available. This ability suggests that AbBFN2 has implicitly learned to reason about structural effects, despite not being explicitly trained on structural data.

The model has proven to be successful in predicting genetic attributes that are typically difficult to infer from amino acid sequences alone. This includes accurate identification of heavy and light chain V, (D), and J genes, highlighting a strong understanding of antibody recombination patterns and lineage information.

Figure 3 (left): Demonstrates that AbBFN2 generated figures match those seen on the OAS. AbBFN2 can generate VH/VL interfaces (blue) that show the same stability as natural antibodies (green). Source: Internal

Figure 4 (right) Shows AbBFN2’s ability to predict immunogenicity. The confidence that the model has in a sequence being of human origin is correlated with the immunogenicity risk of known clinical-stage therapeutics. >99% of antibodies that AbBFN2 confidently classifies as human present low to medium immunogenicity risks. Source: Internal

Figure 5: Illustration of AbBFN2’s multi-objective design capabilities. Comparison demonstrates how frequently requested features are in the held-out OAS data and the conditionally generated AbBFN2 samples. (Right) a set of conditionally generated samples (navy and cyan) folded and superimposed on a known antibody with the requested attributes. The VH is shown in navy and the VL in cyan, with specific residues of interest shown in purple. Source: Internal. 

Figure 6: Illustration of AbBFN2’s multi-objective design capabilities cont. Showcasing how we can use AbBFN2 to tailor highly specific outputs, comprising rare attributes, to generate a desired antibody. Source: Internal
Figure 6: Illustration of AbBFN2’s multi-objective design capabilities cont. Showcasing how we can use AbBFN2 to tailor highly specific outputs, comprising rare attributes, to generate a desired antibody. Source: Internal

Explore AbBFN2 today – download the paper, try the demo on DeepChain, or dive into the open-source code.

InstaDeep
Privacy Overview

Please read our extensive Privacy policy here. You can also read our Privacy Notice and our Cookie Notice