BGPT: Paper Review: ABodyBuilder3: improved and scalable antibody structure predictions

Fuel Your Discoveries

Quick Explanation Copied

ABodyBuilder3 concise verdict

ABodyBuilder3 is an optimized, scalable reimplementation of ABodyBuilder2 that (1) introduces ProtT5 language-model embeddings for residue representation, (2) adds a pLDDT uncertainty head to replace compute-heavy ensembles, and (3) improves CDR loop RMSD (notably CDRH3 marginally) while accelerating training/inference and providing code and weights publicly GitHub and Zenodo availability

Primary source: Kenlay et al. Bioinformatics 2024

Long Explanation

Detailed critical review of ABodyBuilder3

Summary of what the paper does

ABodyBuilder3 is presented as an improved, more scalable antibody variable region structure predictor derived from ABodyBuilder2. Key changes are: vectorized/OpenFold-style implementation for speed and bf16/mixed precision training; optional ProtT5 protein language model residue embeddings (concatenated heavy and light chain embeddings); a pLDDT per-residue uncertainty head that replaces the previous ensemble-derived uncertainty; and careful structure relaxation using OpenMM or YASARA to improve stereochemistry and accuracy. The authors release code and model weights publicly GitHub and Zenodo (data and weights)

Direct textual excerpt from the paper:

What works well (strengths)

Scalability and engineering: the authors reimplemented ABodyBuilder2 with vectorization and mixed/bf16 precision yielding a reported >3x speedup and multi-GPU scaling, which matters for screening large candidate sets.
Reproducibility and openness: code and model weights are publicly available on GitHub and Zenodo which supports reproducibility and reuse .
Practical uncertainty estimate: adding a pLDDT head gives a single-model intrinsic confidence estimate that correlates with RMSD and can substitute an ensemble, lowering compute cost during inference .
Targeted CDR improvements: modest but measurable RMSD gains for CDR loops (CDRH3 reduced from ~2.54 to ~2.42 A in their test), with LM embeddings giving further marginal gain (CDRH3 ~2.40 A), which is meaningful because CDRH3 is the hardest region to model.

Concerns, limitations and blindspots

Magnitude of accuracy gains The improvements in RMSD, while consistent, are modest (CDRH3 improvement ~0.12 A vs ABodyBuilder2; language-model advantage is marginal and often statistically non-significant according to the authors). That makes the claim of "state-of-the-art" improvement true but incremental and conditional on evaluation choices .
Evaluation dataset choices and potential selection bias The authors filter SAbDab heavily (remove nanobodies, high-res cutoff, remove ultra-long CDRH3 >30, remove species occurring >15 times) and limit validation/test to human antibodies; these sensible curation choices reduce noise but also limit generalization to non-human repertoires, ultra-long loops (e.g., bovine cattle antibodies) and low-resolution structures. This must be kept in mind when applying ABodyBuilder3 outside the curated regime 3.5 standard deviations from the mean for any of the six summary statistics given by ABangle… We also filter out ultra-long CDRH3 loops by removing any sequence with a CDRH3 of over 30 residues. url=https://dx.doi.org/10.1093/bioinformatics/btae576 number-citations=# Source Citations descriptive-anchor-text=Dataset filtering evidence-strength=🥈 (Moderate Evidence)>.
Possible dataset contamination via LM pretraining The authors note antibody-specific LMs can introduce pretraining contamination; ProtT5 is trained broadly which reduced that risk but does not eliminate it. Explicit, careful leakage analysis (exclusion of structures/sequences used during LM pretraining) would strengthen claims that gains are not from data leakage .
Metrics are RMSD-centric RMSD is useful but incomplete: developability, epitope contact accuracy, paratope geometry, and energy landscape properties are not reported. For therapeutic design, downstream metrics (binding interface accuracy, predicted developability features) are critical and not assessed here.
Uncertainty calibration and use pLDDT correlates with RMSD and the authors propose thresholds (pLDDT>85 retains ~32% with >80% having CDRH3 RMSD <2 A). However, pLDDT calibration (reliability diagrams, Brier scores) and how to use pLDDT in workflows (trade-off coverage vs accuracy) require more elaboration for practitioners .

Technical appraisal of methods

Architecture: ABodyBuilder3 keeps the ABodyBuilder2/AlphaFold-Multimer inspired pipeline: per-residue node features, relative positional edge encodings, eight sequential structure modules with invariant point attention and backbone updates, and torsion-angle-based full-atom reconstruction. Training losses: FAPE (with clamping), torsion-angle loss, structural violation penalties; optimizer RAdam with cosine annealing restarts and early stopping. These are sensible, modern design choices for antibody-focused structure prediction .

Performance summary (numerical highlights)

The authors report mean RMSD per region on their test set. Representative numbers (mean RMSD in Angstroms) are given in Table 1 of the paper; key facts:

ABodyBuilder2 CDRH3 RMSD ~2.54 A; ABodyBuilder3 CDRH3 ~2.42 A; ABodyBuilder3-LM ~2.40 A.
Framework and several CDRs show modest improvements; some per-region gains are dependent on relaxation method (YASARA vs OpenMM) .
pLDDT correlates with RMSD; Pearson correlations per region improved for ABodyBuilder3-LM relative to ensemble-derived uncertainty in ABodyBuilder2 (e.g., CDRH3 Pearson up to ~0.73 for ABodyBuilder3-LM) .

Reproducibility and resource availability

Code and weights are public (GitHub, Zenodo), training/evaluation details (batch size, optimizer, training schedule, clamp values, validation/test sizes) are reported, and common toolchains are used (ANARCI for IMGT numbering, SAbDab dataset), enabling other groups to reproduce results given adequate compute. The paper includes explicit dataset curation steps and training hyperparameters, supporting reproducibility .

How practitioners should use ABodyBuilder3

Use for high-throughput screening of human-variable-region antibody candidates where moderate CDR accuracy and calibrated per-residue uncertainties suffice.
Apply pLDDT thresholding to select a subset of high-confidence models for downstream intensive computations (docking, MD, developability predictions) as per the authors guidance (pLDDT>85 retains ~32% with high accuracy on CDRH3) .
Do not expect large improvements on edge-case antibodies (non-human, ultra-long CDRH3, heavily glycosylated or antibody-antigen complex-induced conformational change) without further validation.

What would falsify the main claims

Concrete disproof scenarios the authors themselves acknowledge: (1) show no RMSD improvement over ABodyBuilder2 on independent, withheld datasets; (2) show pLDDT correlates poorly with RMSD when evaluated on broader antibody repertoires; (3) demonstrate that ProtT5 gains come from LM pretraining contamination rather than improved representations. The paper lists similar possible falsifications and suggests self-distillation as a next step.

Suggested improvements and follow-up experiments (practical)

Evaluate on external non-human antibody sets and ultra-long CDRH3 cases (bovine) to quantify generalization limits.
Provide calibration curves and reliability diagrams for pLDDT and compare to ensemble-derived uncertainties (Brier score, ECE) to quantify calibration.
Report interface/contact metrics for antibody-antigen complexes (even if only predicted antigens) to assess paratope accuracy for therapeutic applications.
Run ablation studies isolating ProtT5 effect vs one-hot across diverse lengths and sequence identities, and include leakage analysis to rule out pretraining contamination.

Overall assessment

ABodyBuilder3 is a solid, well-engineered incremental advance in antibody structure prediction with good reproducibility and useful practical features (speed, uncertainty head, LM embeddings). It improves CDR modelling modestly and provides tools and data for adoption. The major limitations are the modest size of the RMSD gains, restricted evaluation regime, and the need for better calibration/validation of the uncertainty head across diverse repertoires.

Selected citation

Feedback:

Updated: October 05, 2025