Data sources: T2T-CHM13 adds β200 Mbp missing from GRCh38 and HPRC adds β119 Mbp euchromatic sequence; Tibetan study reported β122.05 Mbp non-reference sequence in a population-specific pangenome β all indicate substantial new sequence beyond a single reference
Representative benchmarks: short-read approaches historically detect ~5kβ10k SVs/sample, while long-read + pangenome studies report tens of thousands (β28.5k per cattle sample at 20Γ HiFi) and per-haplotype SV counts ~14.6k in Tibetan haplotypes β illustrating the scale of previously-missed variation
Benchmark excerpt: graph/pangenome-guided alignment increased SNV precision and recall modestly vs GRCh38 in the COLO829 somatic benchmark; gains are larger in difficult/extreme regions (precision/recall improvements observed)
Confidence: moderateβhigh for the claim that pangenomes and long reads produce more complete variant catalogs (multiple independent studies show large increases in added sequence and SV counts), but lower for near-term clinical impact until graph-aware clinical callers, standards, and regulatory validation are available. What would change the conclusion: if large-scale, independent clinical benchmarks showed no improvement in diagnostic yield or produced unacceptable false-positive rates when pangenome approaches are applied; or if efficient, standardized graph-aware pipelines and annotation transfer prove impracticable in routine clinical labs.
Custom summaries of the latest cutting edge Science research. Every Friday. No Ads.