BGPT: Paper Review: Efficient protein structure generation with sparse denoising models

Fuel Your Discoveries

Quick Explanation Copied

Concise critical summary

Claim SALAD is a family of sparse all-atom denoising models that generate designable and diverse protein backbones up to 1,000 residues with much lower runtime than prior diffusion models, and can be adapted via structure-editing to tasks unseen during training

Long Explanation

Detailed paper analysis and critique

1 Background and central contributions

The manuscript presents salad, a family of sparse all-atom denoising diffusion models for protein backbone generation. Key claims are: (a) substantially improved runtime vs prior diffusion models, (b) comparable or higher designability and diversity across protein lengths (50–1000 residues) when evaluated with standard in silico design pipelines (ProteinMPNN then ESMfold/AF2), and (c) extensibility to out-of-distribution tasks using structure-editing (motif scaffolding, shape conditioning, repeat and multi-state design)

2 Strengths and positive novelties

Computational efficiency: the sparse neighbour-based attention reduces per-structure runtime and enables generation of long proteins (up to 1,000 residues) at practical throughput on a single RTX 3090; authors provide runtime benchmarks and a public implementation
Scaling to large proteins: VE and VP-scaled variants show improved scRMSD and designability at sizes where prior VP-only methods deteriorate (authors use domain-shaped noise initialization to further improve large-protein outcomes)
Structure-editing at sampling time: demonstrates flexible editing strategies (output editing, symmetry editing, multi-state coupling via aligning/averaging fixed residues across trajectories) enabling motif scaffolding, repeat/symmetric designs, and multi-state design without retraining
Open code and parameters: code repository and parameter release improve reproducibility and adoption (GitHub repo maintained by authors)

3 Methodological caveats and limitations (critical)

Evaluation relies solely on computational refolding metrics. The main performance metrics (designability, scRMSD, pLDDT) are computed using ESMfold and AlphaFold predictions after ProteinMPNN sequence design. Authors acknowledge absence of wet-lab validation and argue parity with experimentally validated baselines (RFdiffusion, ProteinGenerator) reduces concern — but this is an incomplete substitute for experimental folding, stability and function assays because predictors can be biased or adversarial
Training dataset bias. Models were trained on a PDB snapshot (Oct 2023) with non-protein components removed; this reduces coverage of membrane proteins, complexes with ligands or nucleic acids, and the diversity present in AlphaFold DB. Authors note Genie2 trained on AlphaFold DB and attributes Genie2's greater diversity partly to that dataset advantage
Designability metric sensitivity and potential circularity. Using ProteinMPNN for sequence design and then refolding with ESMfold/AF2 inherits assumptions: sequence designer and structure predictors may favor certain backbone geometries and create pipeline biases. While this is standard, it can favor methods that generate backbones easier for ProteinMPNN/ESMfold to refold — not necessarily biologically realizable structures
Limited reporting of statistical uncertainty. Many plotted comparisons show medians and ranges, but the paper would benefit from explicit confidence intervals and statistical tests for pairwise method differences (especially when claims assert outperforming prior methods)
Generality of editing heuristics. The editing strategies (averaging aligned coordinates, replacing outputs) are pragmatic but may produce clashes or unrealistic local geometry; success appears task-dependent and occasionally requires additional runtime optimization (e.g. clash_lr) described in scripts. The limits of editing for complex enzyme active sites or ligand-binding pockets are not demonstrated

4 Reproducibility and transparency

Positive: code, Colab notebooks and parameter files are published on GitHub (repository described in README), and authors document hyperparameters, noise schedules, sampling steps, and evaluation pipelines (ProteinMPNN settings, ESMfold/AF2). This materially helps reproducibility. Remaining gaps: (a) full training logs and compute budgets are not exhaustively tabulated in the provided excerpts, (b) explicit random seeds and environment/container artifacts should be supplied for exact replication (authors mention apptainer/docker roadmap)

5 Specific recommended experiments and analyses to strengthen the manuscript

Wet-lab validation: select 10 representative designs across length/topology classes (including at least one shape-conditioned and one multi-state success), express and purify, measure folding (CD, DSC or limited proteolysis) and structure by cryo-EM/X-ray/NMR or at least biochemical stability/oligomeric state; validate one or two functional motif scaffolds with activity assays.
Robustness tests: ablate or vary ProteinMPNN temperature, ESMfold vs AF2 refolding pipelines, and report sensitivity of designability percentages to these factors.
Statistical tests: bootstrap confidence intervals for designability/diversity differences versus RFdiffusion, Genie2 and hallucination baselines; provide effect sizes and p-values or Bayesian posterior intervals.
Dataset expansion ablation: retrain or finetune salad on an AlphaFold DB subset and report diversity changes to quantify dataset effects on diversity vs designability trade-off.
Local geometry diagnostics: quantify clash scores, Ramachandran outliers, sidechain rotamer probabilities of designed sequences to assess physical realism beyond pLDDT/scRMSD.

Each recommendation is feasible and would materially reduce uncertainty about biological validity and generality.

6 Balanced conclusion and confidence

Summary judgement: salad presents a technical advance (sparse attention denoiser plus editing hooks) that meaningfully improves runtime and demonstrates in silico performance competitive with or exceeding recent diffusion models across a broad range of tasks and sizes . However, biological utility remains provisional until experimental folding/stability/function are demonstrated; dataset limitations constrain immediate applications to enzyme and small-molecule binder design.

What would overturn the core claims? If experimental tests showed low folding/stability despite favorable ESMfold/AF2 metrics, or retraining on AlphaFold DB removes designability advantages, the comparative claims would need re-evaluation.

Interactive next steps

You can run an iterative bioinformatics agent to reproduce and extend key analyses (generate backbones, run ProteinMPNN, refold with ESMfold/AF2, compute designability/diversity statistics, and bootstrap confidence intervals). Click below to start:

Author review links

Please consult bespoke author reviews for individual authors:

Feedback:

Updated: November 20, 2025