当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling recent positive selection using identity-by-descent segments
American Journal of Human Genetics ( IF 8.1 ) Pub Date : 2024-10-02 , DOI: 10.1016/j.ajhg.2024.08.023
Seth D. Temple, Ryan K. Waples, Sharon R. Browning

Recent positive selection can result in an excess of long identity-by-descent (IBD) haplotype segments overlapping a locus. The statistical methods that we propose here address three major objectives in studying selective sweeps: scanning for regions of interest, identifying possible sweeping alleles, and estimating a selection coefficient s. First, we implement a selection scan to locate regions with excess IBD rates. Second, we estimate the allele frequency and location of an unknown sweeping allele by aggregating over variants that are more abundant in an inferred outgroup with excess IBD rate versus the rest of the sample. Third, we propose an estimator for the selection coefficient and quantify uncertainty using the parametric bootstrap. Comparing against state-of-the-art methods in extensive simulations, we show that our methods are more precise at estimating s when s0.015. We also show that our 95% confidence intervals contain s in nearly 95% of our simulations. We apply these methods to study positive selection in European ancestry samples from the Trans-Omics for Precision Medicine project. We analyze eight loci where IBD rates are more than four standard deviations above the genome-wide median, including LCT where the maximum IBD rate is 35 standard deviations above the genome-wide median. Overall, we present robust and accurate approaches to study recent adaptive evolution without knowing the identity of the causal allele or using time series data.

中文翻译:


使用身份 by-descent 区段对最近的正向选择进行建模



最近的阳性选择会导致过多的长单倍型片段与基因座重叠。我们在这里提出的统计方法解决了研究选择性扫描的三个主要目标:扫描感兴趣的区域,识别可能的扫描等位基因,以及估计选择系数 s。首先,我们实施选择扫描以查找 IBD 率过高的区域。其次,我们通过聚合在推断的 IBD 率高于样本的外组中更丰富的变异来估计未知扫描等位基因的频率和位置。第三,我们提出了一个选择系数的估计器,并使用参数化 bootstrap 量化了不确定性。与广泛模拟中最先进的方法进行比较,我们表明,当 s≥0.015 时,我们的方法在估计 s 时更加精确。我们还表明,在近 95% 的模拟中,我们的 95% 置信区间包含 s。我们应用这些方法来研究来自 Trans-Omics for Precision Medicine 项目的欧洲血统样本中的阳性选择。我们分析了 IBD 发生率比全基因组中位数高 4 个标准差以上的 8 个基因座,包括 LCT,其中最大 IBD 发生率比全基因组中位数高 35 个标准差。总体而言,我们提出了稳健而准确的方法来研究最近的适应性进化,而无需知道因果等位基因的身份或使用时间序列数据。
更新日期:2024-10-02
down
wechat
bug