BGPT: Paper Review: Three open questions in polygenic score portability

Fuel Your Discoveries

Quick Explanation Copied

Three open questions: (i) individual-level portability is only weakly explained by genetic distance, (ii) portability trends are trait-specific (notably immune traits), and (iii) “portability” depends strongly on which performance metric you use (e.g., precision vs recall).

Long Explanation

Paper review (critical, skeptical, evidence-based)

“Three open questions in polygenic score portability” — 10.1038/s41467-026-68565-3

Updated context: UK Biobank GWAS-trained PGS portability analyzed with an individual-level genetic-distance axis and multiple trait-specific performance metrics.

Core dataset

UK Biobank: 336,923 White British individuals used for GWAS; 69,500 non-White British individuals used for prediction (after QC and exclusions).

Traits & outcomes

15 continuous traits + 2 binary diseases (asthma, type 2 diabetes).

PGS construction + distances

GWAS via PLINK -glm with covariates; SNP clumping; PGS computed using PLINK -score; genetic distance is weighted Euclidean distance in UKB PC space, correlated with individual-specific Fst (Pearson r > 0.9835).

Visual 1 — Design overview (GWAS→PGS→portability evaluation)

Visual 2 — Genetic distance proxy vs Fst

The paper uses PC-based genetic distance as a fast proxy for Fst and reports a strong correlation (but notes weaker reflection at intermediate distances).

Key results (organized by the paper’s “three open questions”)

Below, I separate what is directly reported from what is inferred/hypothesized by the authors.

Open question 1 — Individual-level accuracy vs genetic distance

Reported: Individual-level squared prediction error shows only a weak relationship with genetic distance; a flexible cubic spline explains very little variance for example traits (height example given as ~0.51% R²).
Reported: Socioeconomic measures—specifically Townsend Deprivation Index—explain comparably well or better variation in squared prediction error; for most traits, the paper reports socioeconomic trends with genetic-distance-like monotonic patterns across deprivation quantiles.

Inferred / proposed mechanism: Because genetic-distance proxy quality degrades at intermediate distances, the authors suggest refined distance measures (including local ancestry in PGS-relevant regions) could improve explanation of portability.

Skeptical critique: The core claim is plausible, but it depends on (i) phenotype/covariate handling (residualization steps), and (ii) the degree to which socioeconomic variables co-move with genotype-PCs or other unmeasured confounders. The paper includes explicit covariate adjustment in both GWAS and prediction models (age, sex, interactions; array type where needed; and PC covariates), but residual confounding can remain if socioeconomic patterns correlate with unmeasured ancestry-specific structure or measurement differences not captured by the covariates.

Open question 2 — Trait-specific portability trends

Reported: Group-level accuracy trends vary by trait: for some traits (height) prediction accuracy decays roughly monotonically with genetic distance; for others (e.g., weight/body fat) accuracy peaks at intermediate distances; and immune-related traits show near-zero group-level accuracy even at short genetic distances.

Proposed mechanism (authors’ hypothesis): Immune traits may have fast evolutionary turnover; the paper tests aspects of this by re-estimating index-SNP effects closer vs farther from the GWAS sample and shows less consistency for lymphocyte count than for triglycerides, plus differences in allele heterozygosity for large-effect SNPs.

Skeptical critique: This is still a correlational story. Opposite-sign index effects could arise from multiple sources beyond turnover: sampling variability, differences in LD tagging, GWAS winner’s curse artifacts, or differences in how the phenotype is measured across subsets. The paper does connect the rapid turnover idea to winner’s curse logic and shows heterozygosity/PGS variance changes for immune traits, which strengthens internal coherence, but it does not directly demonstrate selection dynamics.

Open question 3 — Predictive performance metric changes the story

Reported: Portability interpretations differ by performance metric: for some immune-related traits, group-level near-zero accuracy can coexist with increasing individual-level accuracy with genetic distance.
Reported: For disease risk stratification, precision and recall trends can differ: asthma shows qualitatively similar dependence of precision and recall on genetic distance, whereas type 2 diabetes shows roughly constant precision for medium/large distances but recall increases far from the GWAS sample.

Skeptical critique: Metric dependence is expected—different metrics emphasize different parts of the joint calibration/decision boundary. But the more subtle issue is that their disease classification thresholds are chosen to maximize F1 on the GWAS set (percentiles differ by disease). That can mechanically create metric-specific behaviors when the score distribution shifts across genetic distance. This does not invalidate the finding, but it means the observed trends are partly a product of how the scoring thresholds are tuned.

Visual 3 — Relative explanatory power: genetic distance vs Townsend SES (reported ranges)

The paper reports that Townsend deprivation index explains between 0.02% and 0.53% of variance in squared prediction error across traits (and generally more than genetic distance). Genetic distance explained little variance via spline fits, with an example of ~0.51% for height.

Limitations & potential blind spots (critical, not “political”)

Distance metric misspecification: the paper’s PC-based genetic distance correlates strongly with Fst, yet is less reflective at intermediate distances; misalignment could dilute any causal “distance→accuracy” relationship.
Confounding between genotype structure and environment: SES measures may correlate with ancestry PCs or unmeasured structure; the paper adjusts for PCs and covariates, but residual confounding can still produce SES-dominant explanatory patterns.
Trait/metric tuning choices: disease classification thresholds are selected to maximize F1 in the GWAS set; metric trends across genetic distance may partly reflect this thresholding strategy and score distribution shifts.
Generalizability beyond UK Biobank partition: WB GWAS vs NWB prediction is one specific sampling frame; portability behaviors could differ with other training cohorts, different LD patterns, or different phenotype measurement regimes. The paper acknowledges the need for refined distance metrics and broader factors, but does not test cross-cohort external replication within the provided text.

What would disprove or revise the paper’s “three gaps” framing?

If refined local-ancestry-aware distance measures (especially in PGS index regions) eliminated the weak individual-level relationship between genetic distance and squared error, then the first “gap” would shrink materially.
If trait-specific immune portability drops disappeared after controlling for alternative phenotype measurement structures and after applying alternative PGS construction strategies, then the immune-turnover mechanism would be less compelling.
If precision/recall portability trends matched each other (or became invariant) under threshold-free or calibration-based decision criteria, then part of the “metric dependence” might be threshold-choice artifact.

Most actionable takeaways for users

Don’t summarize portability with one number: the paper shows metric-dependent and level-dependent portability (group R² vs individual squared error; precision vs recall).
Expect trait-specific behavior: immune-related phenotypes may show sharp portability failures even when genetic distance is not extreme.
Model your “distance” carefully: the paper suggests that local-ancestry-aware measures in PGS-relevant regions could be more informative than a global PC-distance proxy.

Optional: run an independent Science AI Agent

This agent can iteratively re-check logic against the paper text, generate additional metric-focused plots, and propose falsification-focused follow-ups using the paper’s methods and code links.

Author-specific BGPT reviews

Feedback:

Updated: April 23, 2026