BGPT: Paper Review: On Averaging Power for Genetic Association and Linkage Studies

Fuel Your Discoveries

Quick Explanation Copied

Concise verdict: Zheng et al. (2005) present a practical, well-argued extension of averaged-power calculations to Cochran–Armitage (CA) tests in case–control studies and to affected-sib-pair linkage tests, showing modest but real reductions in naive (expected-count) power and offering computational shortcuts (O(N^2) vs O(N^4)) and sensitivity analyses for unknown allele frequency or genetic model — see methods, numerical tables and CardioGene example below for evidence and limitations

Long Explanation

Visual paper review — "On Averaging Power for Genetic Association and Linkage Studies" (Zheng et al., 2005)

Key claim: averaging power over the randomness in genotype counts or IBD counts gives more realistic (usually lower) power than plugging-in expected counts; using CA trend test reduces computation and yields averaged-power formulas dependent on single population allele frequency; sensitivity to assumed allele frequency and to genetic model should be examined.

Notes: points plotted are explicit values reported in the CardioGene example: planned averaged power at assumed p=0.3 (82.2%), realized averaged power if true p_t=0.33 (82.4%), and if p_t=0.1 (49.3%) — illustrating large sensitivity when true allele frequency differs substantially from planning assumption

Interpretation: Table 2 shows that averaged powers m and p are slightly (<1%) below naive power computed at expected counts, while p averaged over beta priors can reduce power further (several percentage points) depending on prior variance — supporting the authors' assertion that naive planning may overestimate realized power

Critical appraisal (concise)

Strength — Practical, computationally useful: Replacing the 2-df chi-square averaging by CA trends yields a big asymptotic/speed advantage (authors reduce combinatorics to O(N^2)), enabling routine sensitivity studies for sample planning
Strength — Clear sensitivity diagnostics: The CardioGene illustration gives a concrete example showing large power loss if real p differs substantially from planning p (e.g., 82%→49%), motivating inclusion of prior uncertainty in design
Strength — Linkage insights: For ASP linkage tests, the paper systematically applies averaging to IBD counts and finds the MERT (T0.355) retains robust averaged power across unknown models — useful guidance for designing ASP linkage analyses
Limitation — Asymptotics & finite-sample fidelity: The conclusions rely on normal/chi-square asymptotics for the CA and score tests; finite-sample deviations (low counts, rare alleles, strong HWE departures) could invalidate normal approximations and change averaged-power differences; authors note this but do not fully map boundaries of validity
Limitation — Simplifying assumptions: HWE assumption, single-marker setting, no LD, no genotyping error, and priors (beta) are simple — in modern GWAS/WES contexts (many correlated markers, population stratification, genotype uncertainty) extension is nontrivial and not demonstrated here.
Opportunity — Integration with modern designs: Methods here are directly useful when planning single-marker candidate studies or ASP screens; to scale to GWAS, one needs multi-marker corrections (multiple testing), LD-structure-aware averaging, or resampling-based averaged-power pipelines (authors' O(N^2) tricks still helpful for per-marker approximations).

What would change my evaluation?

Simulation studies showing where asymptotic approximations fail: low MAF, extreme imbalance, or HWE violation and how large the bias becomes.
Extension of averaging formulas to genome-wide multiple-testing contexts (Bonferroni/perm-based) or to multi-marker tests with LD.
Empirical validation on real GWAS arrays or sequencing-derived genotype data (with genotype calling uncertainty) to quantify practical differences in realized study power.

Direct citations

Core source:

Actionable recommendations for practitioners

When planning single-marker case–control studies, compute averaged power E_n[(n)|p] (and E_p{E_n[.]}) using CA test formulas to avoid modest overoptimism from expected-count plug-in power; if p uncertain, use beta priors and report sensitivity curves (example figure above)
For affected-sib-pair linkage, prefer robust tests (e.g., T0.355) if model is unknown; still compute averaged-power across plausible x priors to evaluate worst-case power loss.
For modern GWAS, adapt averaged-power ideas to per-marker estimates (accounting for local MAF and HWE departures), and combine with multiple-testing corrections or permutation to assess study-level discovery probability.

Reproducibility & limitations (brief)

Methods are analytic and numerical; tables give concrete parameter values (sample sizes, prevalences, genotype relative risks, beta priors) so replication of the reported numbers is straightforward with the paper formulas. However, authors did not publish code; implementation requires careful matching of formulas (e.g., CA asymptotic variances) and numerical summation; finite-sample exact enumeration or simulation would strengthen reproducibility for small counts or rare variants

If you want a full numerical reimplementation (exact reproduction of all tables and continuous sensitivity plots across allele frequency and prior variance), click to run an automated pipeline that will re-calculate all averaged-power tables and produce high-resolution figures.

Feedback:

Updated: March 06, 2026