BGPT: Paper Review: Delineating inter- and intra-antibody repertoire evolution with AntibodyForests

Fuel Your Discoveries

Quick Explanation Copied

Quick take: AntibodyForests is a well-engineered, reproducible R toolbox that integrates lineage reconstruction, repertoire-level topology metrics, PLM pseudolikelihoods and structure-based measures to enable inter- and intra-repertoire evolutionary analyses; key strengths are multimodal integration and CRAN/GitHub distribution, while main limitations are sensitivity to internal-node handling, PLM training biases, and structural-model uncertainty (see citations below).

Primary source:

Long Explanation

Visual paper analysis — AntibodyForests (van Ginneken et al., 2025)

Visualize first — metrics and reproduction status, then concise critique and next steps. All claims are inline-cited to the source papers.

Highlights (visual):

Multimodal integration: lineage trees + PLM pseudolikelihoods + predicted structures (AlphaFold3) (AntibodyForests functions) [citation below]
Repertoire-level topology metrics (Sackin index, Laplacian spectral properties) for clustering and selection inference
Flexible tree-construction and internal-node handling—important but a source of topology sensitivity

Key claims and evidence

AntibodyForests reconstructs clonal lineage networks and supports distance-based and phylogenetic algorithms (germline-rooted MST, neighbor-joining, MP/ML) and is IgPhyML-compatible — enabling users to choose models appropriate to antibody evolution ().
Repertoire-wide topology metrics (Sackin index, Laplacian spectral density) are implemented to compare repertoires and cluster topology space — useful to detect selection signatures ().
Integration with PLM-derived pseudolikelihoods and per-residue likelihoods allows correlating model-implied evolutionary likelihood with observed SHM along trees — builds on literature showing PLM pseudolikelihoods capture in vivo selection features (; AntibodyForests exposes functions to compute and relate these scores ().
Structural evolution: AntibodyForests can integrate AlphaFold3 predicted structures (including antibody–antigen complexes when available) and quantify RMSD/pLDDT/biophysical changes as a function of mutation distance — but the authors correctly caution about CDR-loop predictive uncertainty ().

Critical appraisal — strengths and limitations

Strengths

Multimodal — sequences, metadata, PLMs, and structures combined into a single R object/workflow ().
Flexible tree options allow method comparisons and robustness assessment (GBLD metric included) rather than forcing a single pipeline decision.
Designed for single-cell paired heavy–light chain data but compatible with bulk data — increases applicability.

Limitations & blindspots

Internal-node handling choices strongly affect topology and downstream metrics; the package exposes options but does not provide a formal statistical framework for choosing among them — so users must test sensitivity ().
PLM-derived pseudolikelihood interpretations depend on PLM type and input framing (full VDJ vs CDR3). Prior work shows strong dependence on model and input region; AntibodyForests allows PLM inputs but cannot remove training-data biases or guarantee causal selection inference ().
Structural integration depends on predicted models (AlphaFold3/AF3, ABlooper) — CDR loop accuracy and single-mutation effects remain uncertain; authors acknowledge this limitation ().
Dependence on supplementary data for exact accession numbers and some example analyses may slow immediate reproducibility unless the supplement is retrieved ().

Where this package fits into the field

AntibodyForests builds on methods for antibody phylogenetics (IgTree, SONAR, Change-O, IgPhyML) and the emerging literature using PLMs to annotate evolutionary propensity; it is positioned as a repertoire-level integrative toolbox rather than a single-lineage specialist ().

Practical recommendations for users

Always run sensitivity analyses across tree-construction and internal-node removal options; quantify topology shifts (GBLD) before interpreting selection signals ().
If using PLM likelihoods, compare multiple PLMs (general vs antibody-specific) and input contexts (full VDJ vs CDR3) following best-practices from PLM literature: context matters ().
When interpreting structural RMSD/pLDDT changes, treat single-mutation structural inferences cautiously and prefer experimental binding/functional follow-up for candidate antibodies highlighted by AntibodyForests.

What would falsify the main claims?

If different internal-node removal strategies produce inconsistent repertoire-level conclusions (e.g., selection vs neutrality) across multiple, independent datasets such that no robust biological signal remains, this would challenge claims about reliable repertoire-level inference ().
If PLM pseudolikelihoods systematically fail to correlate with observed selection/maturation patterns across independent datasets (contradicting PLM literature), then PLM-based repertoire analyses would be undermined ().

Next actionable steps / experiments

Benchmark AntibodyForests analyses across at least three diverse public repertoires with known functional readouts (e.g., datasets used in PLM work and vaccine studies) to confirm topology-to-function associations and PLM correlations ().
Perform experimental follow-up on candidates prioritized by combined PLM + topology + structural divergence (e.g., expression + binding + neutralization) to validate the predictive pipeline.

Reproducibility links (from paper):

AntibodyForests code: GitHub and vignette: CRAN vignette ().

(invokes an automated bioinformatics agent to run AntibodyForests pipelines on supplied/linked datasets)

Feedback:

Updated: March 11, 2026