BGPT: Paper Review: scAmp analyzes focal gene amplifications at single-cell resolution

Fuel Your Discoveries

Quick Explanation Copied

scAmp in one glance

scAmp is a probabilistic framework that calls focal ecDNA gene amplifications at single-cell resolution from single-cell copy-number distributions, aiming to separate ecDNA from chromosomal high-copy amplifications and then link ecDNA+ subclones to chromatin accessibility programs in tumors.

Long Explanation

Paper Review (Science-First, Skeptical): scAmp analyzes focal gene amplifications at single-cell resolution

Paper DOI/slug:
Paper date:

1) Model performance claims (as reported)

Values shown are taken directly from the provided manuscript text (simulated AP comparison, AUROC on patient tumors, and agreement percentages vs WGS).

2) Reported ecDNA call concordance structure

These “disagreeing fractions” are computed as 1−reported agreement, using only the agreement numbers stated in the paper text.

3) scAmp modeling choices (what is explicitly stated)

The visualization encodes string facts (not numeric quantities) directly from the manuscript methods text: 32,500 training examples; MLP architecture (final model described); 14 distribution summary features; CN>2 cell inclusion; and likelihood>0.6 calling threshold.

4) Training set composition (as explicitly enumerated)

scAmp’s training data composition is stated in Methods as a set of categories summing to 32,500 simulated examples.

A) What the paper claims to do (scope & pipeline)

Core problem: Distinguish ecDNA (circular extrachromosomal DNA) vs chromosomal focal amplifications at single-cell resolution, to enable subclonal/ecDNA distribution analysis and link to phenotypes.
Inputs: Single-cell copy-number distributions derived from assays such as single-cell WGS or scATAC-seq (via 3 Mb windows copy-number calling).
Model: scAmp trains an MLP that predicts per-gene probability of ecDNA amplification from 14 summary statistics of single-cell copy-number distributions (with CN>2 cell inclusion).
Outputs: Gene-level ecDNA calls, tumor-level ecDNA presence, and single-cell state stratifications (in their TCGA scATAC-seq analysis).

B) Evidence used for validation (and where it is strongest)

1) Benchmarks against WGS-based ecDNA labeling

The paper reports per-gene agreement (~80%) and tumor-level agreement (~79%) on a cohort of 73 patient tumors profiled with scATAC-seq and compared to WGS-derived ecDNA calls.

2) A “discordant case” with orthogonal validation (FISH)

For BT474 (ERBB2), the paper describes a WGS-vs-scAmp disagreement and states that metaphase DNA FISH supported scAmp’s classification as chromosomal amplification.

3) Functional linkage to chromatin accessibility

The paper reports ecDNA+ cancer-cell state differences in Hallmark-like pathway/module scores, including upregulation of glycolysis and hypoxia-sensing-related pathways and downregulation of mitotic spindle assembly and reactive oxygen species signatures, with stated Wilcoxon rank-sum p-values.

C) Mechanistic interpretation (what is inferred vs what is directly measured)

Directly measured: scATAC-seq copy-number features (via windowed copy-number calling) and chromatin accessibility module score differences after stratification.
Inferred: ecDNA vs chromosomal amplification mode at the single-cell level is inferred by the trained model from copy-number distribution statistics; this is a probabilistic inference problem, not a structural sequencing measurement.

D) Skeptical critique: key limitations & likely failure modes

1) Heavy dependence on simulated training distributions

scAmp is supervised by simulated ecDNA/HSR copy-number trajectories, injected into a noise model. That means performance can be sensitive to (i) mismatch between simulated and real single-cell CN noise, (ii) mismatch between simulated evolutionary dynamics and actual tumor dynamics, and (iii) feature sufficiency (14 summary stats may discard structure).

2) Ground truth labels come from WGS-based ecDNA classification pipelines

Tumor/gene labels used for training/evaluation depend on ecDNA classification from WGS-based workflows (AmpliconSuite/AmpliconArchitect/AmpliconClassifier). If those upstream labels systematically misclassify certain regimes, scAmp can inherit those biases—even if scAmp sometimes corrects them in specific cases (as the ERBB2 BT474 example suggests).

3) Potential “copy-number regime” sensitivity

The paper states that a null mean-copy-number model fails to disambiguate ecDNA from highly amplified chromosomal amplifications (e.g., average copy-number >10), while scAmp remains accurate across copy-number regimes; however, accuracy “across regimes” is still an empirical generalization claim that could break for unrepresented CN distributions.

4) Thresholding & probability calibration

The decision rule “likelihood > 0.6” is explicit. But the paper text provided does not include calibration curves, uncertainty quantification, or a sensitivity analysis on that threshold. That leaves open whether small threshold shifts change ecological conclusions (e.g., prevalence, pathway enrichment).

5) Corporate affiliations and personnel ties (confounding risk)

The paper lists corporate employment/stockholding for K.K. (Amgen) and S.A. (Amgen), and a prior consulting relationship (J.L. previously provided consulting services to Boundless Bio). This does not prove bias, but it warrants extra caution: readers should scrutinize the transparency of code/data availability and robustness analyses.

E) What would most credibly disprove the main claims?

Independent structural validation at scale: If ecDNA calls made from scAmp (in single-cell CN space) systematically disagree with orthogonal DNA-structure validation (e.g., microscopy FISH at interphase/metaphase across many tumors/genes) beyond the showcased discordant case.
Out-of-distribution assay generalization failures: If scAmp performs well on TCGA scATAC-seq but breaks on other single-cell CN estimators due to systematic differences in CN feature formation (dropout/sparsity, windowing choices).
Calibration sensitivity of biological inferences: If the reported pathway/module score differences disappear when ecDNA+ calls are defined by alternative thresholds or by probabilistic weighting rather than hard likelihood cutoffs.

Author reviews (click to open BGPT pages)

Feedback:

Updated: April 30, 2026