当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inflation of polygenic risk scores caused by sample overlap and relatedness: Examples of a major risk of bias
American Journal of Human Genetics ( IF 8.1 ) Pub Date : 2024-08-20 , DOI: 10.1016/j.ajhg.2024.07.014
Colin A Ellis 1 , Karen L Oliver 2 , Rebekah V Harris 3 , Ruth Ottman 4 , Ingrid E Scheffer 5 , Heather C Mefford 6 , Michael P Epstein 7 , Samuel F Berkovic 3 , Melanie Bahlo 8
Affiliation  

Polygenic risk scores (PRSs) are an important tool for understanding the role of common genetic variants in human disease. Standard best practices recommend that PRSs be analyzed in cohorts that are independent of the genome-wide association study (GWAS) used to derive the scores without sample overlap or relatedness between the two cohorts. However, identifying sample overlap and relatedness can be challenging in an era of GWASs performed by large biobanks and international research consortia. Although most genomics researchers are aware of best practices and theoretical concerns about sample overlap and relatedness between GWAS and PRS cohorts, the prevailing assumption is that the risk of bias is small for very large GWASs. Here, we present two real-world examples demonstrating that sample overlap and relatedness is not a minor or theoretical concern but an important potential source of bias in PRS studies. Using a recently developed statistical adjustment tool, we found that excluding overlapping and related samples was equal to or more powerful than adjusting for overlap bias. Our goal is to make genomics researchers aware of the magnitude of risk of bias from sample overlap and relatedness and to highlight the need for mitigation tools, including independent validation cohorts in PRS studies, continued development of statistical adjustment methods, and tools for researchers to test their cohorts for overlap and relatedness with GWAS cohorts without sharing individual-level data.

中文翻译:


样本重叠和相关性引起的多基因风险评分膨胀:主要偏倚风险示例



多基因风险评分 (PRS) 是了解常见遗传变异在人类疾病中的作用的重要工具。标准最佳实践建议在独立于用于得出分数的全基因组关联研究 (GWAS) 的队列中分析 PRS,两个队列之间没有样本重叠或相关性。然而,在大型生物样本库和国际研究联盟进行 GWAS 的时代,识别样本重叠和相关性可能具有挑战性。尽管大多数基因组学研究人员都了解关于 GWAS 和 PRS 队列之间样本重叠和相关性的最佳实践和理论担忧,但普遍的假设是,对于非常大的 GWAS,偏倚风险很小。在这里,我们提供了两个真实世界的例子,表明样本重叠和相关性不是次要或理论问题,而是 PRS 研究中重要的潜在偏倚来源。使用最近开发的统计调整工具,我们发现排除重叠和相关样本等于或比调整重叠偏差更有效。我们的目标是让基因组学研究人员意识到样本重叠和相关性的偏倚风险的大小,并强调对缓解工具的需求,包括 PRS 研究中的独立验证队列、统计调整方法的持续开发,以及供研究人员测试其队列与 GWAS 队列的重叠和相关性的工具,而无需共享个体水平的数据。
更新日期:2024-08-20
down
wechat
bug