Data source: super-pangenome graph reported by Cochetel et al.; haplotype-level averages: core ≈48%, dispensable ≈36%, private ≈16% ().
Reported graph scale: ~200 million nodes, ~1.7 Gb total sequence in the graph, 342 chromosome paths (18 haplotypes × 19 chromosomes) ().
Strengths — high-quality, phased diploid assemblies (PacBio + Bionano, BUSCO >95%), reference-unbiased graph construction using PGGB; extensive validation (NUCmer concordance, SDR and PD loci recapitulated); public raw data + code; demonstration of pan-GWAS feasibility across interfertile species ().
Limitations / blind spots — limited sampling (one accession per species) means intraspecific diversity is underrepresented and private gene counts will increase with more samples; V. rupestris lacked Iso-Seq evidence which may bias its annotation; pan‑GWAS sample sizes (153) are modest across structured multi‑species cohorts — population structure, allele frequency differences and cross‑species LD may produce false positives/negatives; candidate AtCHX20 homolog requires functional validation (transgenic, expression under salt) before causal claims ().
Methodological caveats — PGGB parameter choices, seqwish/seqgraph kmer settings, and smoothxg polishing influence graph topology; deconstruction of complex graphs into VCFs (vg deconstruct) can fragment multiallelic/haplotype-context variation; comparing vg-based genotyping to linear callers showed ~90% SNP concordance but only ~76% INDEL concordance (implying some structural differences remain sensitive to method) ().
Primary paper & code/data:
Custom summaries of the latest cutting edge Science research. Every Friday. No Ads.