当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning
American Journal of Human Genetics ( IF 8.1 ) Pub Date : 2024-06-21 , DOI: 10.1016/j.ajhg.2024.06.003
Remo Monti 1 , Lisa Eick 2 , Georgi Hudjashov 3 , Kristi Läll 3 , Stavroula Kanoni 4 , Brooke N Wolford 5 , Benjamin Wingfield 6 , Oliver Pain 7 , Sophie Wharrie 8 , Bradley Jermy 2 , Aoife McMahon 6 , Tuomo Hartonen 2 , Henrike Heyne 9 , Nina Mars 10 , Samuel Lambert 11 , , Kristian Hveem 12 , Michael Inouye 13 , David A van Heel 14 , Reedik Mägi 3 , Pekka Marttinen 8 , Samuli Ripatti 15 , Andrea Ganna 16 , Christoph Lippert 17
Affiliation  

Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.

中文翻译:


对五个生物库中多基因评分方法的评估表明,生物库之间的差异比方法更大,并发现了集成学习的好处



从全基因组关联研究中估计多基因评分(PGS)的方法越来越多地被使用。然而,缺乏独立的方法评估,方法比较往往受到限制。在这里,我们在参考标准化框架的基础上,评估了五项生物库研究(总共约 120 万名参与者)中通过七种方法得出的针对 16 种疾病和数量性状的多基因评分。我们进行了荟萃分析,以量化方法选择、超参数调整、方法集成和目标生物库对 PGS 性能的影响。我们发现没有任何一种方法能够始终优于所有其他方法。当方法经过良好调整时,生物库之间的 PG​​S 效应大小比生物库内的方法之间的差异更大。对于两种研究的自身免疫性疾病、血清阳性类风湿性关节炎和 1 型糖尿病,方法之间的差异最大。对于大多数方法来说,交叉验证对于调整超参数比自动调整(不使用目标数据)更可靠。对于给定的目标表型,在英国生物库中调整的跨方法结合 PGS(整体 PGS)的弹性网络模型提供了一致的、高的、跨生物库的可转移性能,将 PGS 效应大小(β 系数)增加了中位数 5.0%(相对于LDpred2 和 MegaPRS(通过交叉验证进行调整时两种性能最佳的单一方法)。我们的交互式可浏览在线结果和开源工作流程 prspipe 为跨生物库的多基因评分方法的分析提供了丰富的资源和参考。
更新日期:2024-06-21
down
wechat
bug