BGPT: Paper Review: Patterns of somatic mutation in human cancer genomes

Fuel Your Discoveries

Quick Explanation Copied

Concise critique: Greenman et al. (Nature 2007) sequenced 518 kinases in 210 cancers (274 Mb), found ~1,007 somatic mutations (≈921 SNVs), and used the nonsynonymous:synonymous excess to estimate ~158 driver mutations across ≈119 kinase genes — showing most mutations are passengers and that mutational signatures and hypermutators (MMR-deficient, temozolomide-treated gliomas, UV/tobacco-exposed tumours) dominantly shape mutation burden

Long Explanation

Visual review — Patterns of somatic mutation in human cancer genomes (Greenman et al., Nature 2007)

Visualize first — then concise evidence-driven interpretation. All quantitatively-stated claims below are inline-cited to the original paper and later context papers.

Source data: the Nature 2007 protein-kinase resequencing screen (1,007 somatic events; breakdown shown)

Key pattern: extreme heterogeneity in mutation burden — MMR-deficient tumours and therapy-exposed gliomas are outliers; even after excluding known hypermutators, tissue-specific differences persist. Data sourced and summarized from Greenman et al. 2007

Interpretation & critical evaluation (evidence-first)

Study design strength: the study established a feasible, careful pipeline for large-scale targeted resequencing (PCR + bidirectional Sanger) across 518 kinases, with matched normal validation and QC by SNP arrays — rigorous for 2007 standards and highly reproducible for the targeted scope
Major empirical findings (data-supported):
- Most somatic mutations in the screen are passengers — estimated 763/921 base substitutions are passengers, and ~158 are drivers (95% CI 63–246) distributed across ~119 genes; inference based on observed excess of nonsynonymous over synonymous changes (selection pressure 1.29)
- Mutational signatures correlate with exposure and repair defects: UV/tobacco/alkylating chemotherapy leave recognisable imprints; MMR-deficient tumours and temozolomide-treated gliomas display hypermutation; sequence-context enrichments (e.g., TpC/GpA for certain C>G transversions) indicate context-specific processes
Statistical inference — strengths and caveats:
Strength: using synonymous mutations as an internal neutral expectation is a robust, widely-used approach to detect selection in coding regions; the authors corrected for context and codon usage and employed Monte Carlo tests.

Caveat: reliance on non-synonymous:synonymous excess alone cannot prove functional driver status for individual mutations — it provides cohort-level evidence for positive selection but must be complemented by recurrence, clustering, pathway context, and functional validation. Large genes (e.g., TTN) accumulate many mutations and can rank highly by mutation count despite most events being passengers (authors note TTN as top-ranked statistical hit and caution interpretation)
Biological insights: the study expanded the candidate cancer-gene space by statistically implicating ~120 kinase genes, identified plausible functional mutations in conserved kinase motifs (P-loop, DFG activation segment), and highlighted pathway-level enrichment (FGF signalling, MAPK/JNK components) — giving experimentally testable targets for follow-up
Limitations & blindspots (explicit + additional modern context):
- Targeted panel biases: only 518 kinases were interrogated, so non-kinase drivers and regulatory/non-coding events were missed — the authors acknowledge the study as a targeted first pass
- Sample-size per tumour-type: modest per-class Ns (e.g., 9 gliomas, 20 lung carcinomas) reduce power to detect low-frequency drivers; modern WGS consortia (PCAWG, Pan-Cancer) have shown the value of thousands of genomes for robust driver discovery
- Functional validation gap: statistical inference suggests candidates (e.g., TTN, NTRK3, ATM, MAP2K4), but true driver assignment requires recurrence, clustering/hotspot detection, and experimental assays (kinase activation/inactivation studies) — the paper supplies strong candidates but not exhaustive biological proofs
- Hypermutators and therapy-induced signatures: inclusion of temozolomide-exposed gliomas and MMR-deficient tumours inflates variance in mutation burden and can overwhelm selection signals; authors partially correct by excluding known hypermutators in some analyses but residual confounding remains possible

How this paper changed the field (short evidence trail)

Demonstrated feasibility and value of systematic resequencing of gene families in tumours; inspired larger exome/genome consortia (CCDS, Sjoblom et al. 2006; later PCAWG) that scaled sequencing and driver detection methods
Popularized use of S/NS excess and Monte Carlo frameworks for driver inference; later methods added hotspot/domain analyses and integrated backgrounds (e.g., MSEA, OncodriveCLUST, SigProfiler topography) to refine driver/hotspot detection and genomic-context effects

Conclusions, confidence & what would change the inference

Bottom-line: the 2007 Greenman et al. kinase resequencing was rigorous, influential, and produced reproducible cohort-level evidence that most somatic point mutations are passengers while a substantial set of kinases carry driver mutations; however, the targeted design, modest per-type sample sizes, and statistical nature of driver inference mean many candidate genes require functional and recurrence-based validation. This assessment is directly supported by the paper and subsequent pan-cancer whole-genome work showing the need for larger cohorts and whole-genome contexts

Actionable follow-ups (short list)

Functional follow-up of high-ranking but large-size genes (e.g., TTN): test for recurrence, expression in relevant tissues, and functional impact on growth/kinase signalling.
Apply hotspot/domain-aware methods (MSEA, OncodriveCLUST) to the provided mutations to refine candidate drivers and reduce gene-size bias
Integrate with whole-genome data and modern mutational-signature tools (SigProfilerTopography) to map signatures onto chromatin/replication/strand features for mechanistic attribution of signatures (e.g., the TpC/GpA context)

Technical reproducibility & data availability

The methods (primer lists, supplementary tables) and mutation lists were provided in the Nature supplement and Sanger CGP resources; the pipeline (Sanger Sanger/Capillary sequencing + matched-normal confirmation) is reproducible. Limitation: Sanger throughput limited sample scale compared to modern WGS/WES pipelines; raw trace files availability is not universal for all contemporary studies but the processed calls and tables were released

Limitations, hidden assumptions, and falsifiability

Hidden assumption: synonymous changes are fully neutral — if context-dependent synonymous selection exists (e.g., splicing regulatory signals) it could bias selection-pressure estimates.
Falsifiability test: large-scale unbiased WGS of hundreds–thousands of matched tumours across tissue types showing either (a) far fewer driver kinases than estimated, or (b) no excess of non-synonymous substitutions after rigorous context modelling, would falsify the core inference; PCAWG-style datasets are the right test bed

If you want exhaustive reanalysis (hotspot re-ranking, re-computing S/NS controlling for trinucleotide context, or re-interpreting TTN ranking), click Run AI Scientist to run iterative bioinformatics on the original mutation list.

Feedback:

Updated: March 10, 2026