当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Bayesian Selection of Relaxed-clock Models: Distinguishing Between Independent and Autocorrelated Rates
Systematic Biology ( IF 6.1 ) Pub Date : 2024-11-21 , DOI: 10.1093/sysbio/syae066 Muthukumaran Panchaksaram, Lucas Freitas, Mario dos Reis
Systematic Biology ( IF 6.1 ) Pub Date : 2024-11-21 , DOI: 10.1093/sysbio/syae066 Muthukumaran Panchaksaram, Lucas Freitas, Mario dos Reis
In Bayesian molecular-clock dating of species divergences, rate models are used to construct the prior on the molecular evolutionary rates for branches in the phylogeny, with independent and autocorrelated rate models being commonly used. The two classes of models, however, can result in markedly different divergence time estimates for the same dataset, and thus selecting the best rate model appears important for obtaining reliable in- ferences of divergence times. However, the properties of Bayesian rate model selection are not well understood, in particular when the number of sequence partitions analysed increases and when age calibrations (such as fossil calibrations) are misspecified. Further- more, Bayesian rate model selection is computationally expensive as it requires calculation of marginal likelihoods by MCMC sampling, and therefore methods that can speed up the model selection procedure without compromising its accuracy are desirable. In this study, we use a combination of computer simulations and real data analysis to investigate the sta- tistical behaviour of Bayesian rate model selection and we also explore approximations of the likelihood to improve computational efficiency in large phylogenomic datasets. Our simulations demonstrate that the posterior probability for the correct rate model converges to one as more molecular sequence partitions are analysed and when no calibrations are used, as expected due to asymptotic Bayesian model selection theory. Furthermore, we also show the model selection procedure is robust to slight misspecification of calibrations, and reliable inference of the correct rate model is possible in this case. However, we show that when calibrations are seriously misspecified, calculated model probabilities are com- pletely wrong and may converge to one for the wrong rate model. Finally, we demonstrate that approximating the phylogenetic likelihood under an arcsine branch-length transform can dramatically reduce the computational cost of rate model selection without compro- mising accuracy. We test the approximate procedure on two large phylogenies of primates (372 species) and flowering plants (644 species), replicating results obtained on smaller datasets using exact likelihood. Our findings and methodology can assist users in selecting the optimal rate model for estimating times and rates along the Tree of Life.
中文翻译:
松弛时钟模型的贝叶斯选择:区分独立速率和自相关速率
在物种差异的贝叶斯分子钟测年中,速率模型用于构建系统发育中分支的分子进化速率的先验,通常使用独立和自相关速率模型。然而,这两类模型可能导致同一数据集的发散时间估计明显不同,因此选择最佳速率模型对于获得可靠的发散时间参考似乎很重要。然而,贝叶斯速率模型选择的特性并不十分清楚,特别是当分析的序列分区数量增加以及年龄校准(例如化石校准)指定错误时。此外,贝叶斯速率模型选择在计算上很昂贵,因为它需要通过 MCMC 采样来计算边际似然,因此需要能够在不影响其准确性的情况下加快模型选择过程的方法。在这项研究中,我们结合使用计算机模拟和真实数据分析来研究贝叶斯速率模型选择的静态行为,我们还探索了在大型系统发育数据集中提高计算效率的可能性的近似值。我们的模拟表明,随着分析更多的分子序列分区并且没有使用校准,正确速率模型的后验概率收敛为 1,正如渐近贝叶斯模型选择理论所预期的那样。此外,我们还表明模型选择程序对校准的轻微错误指定是稳健的,并且在这种情况下可以可靠地推断正确的速率模型。 然而,我们表明,当校准被严重错误指定时,计算的模型概率是完全错误的,并且可能会收敛到错误的速率模型。最后,我们证明,在反正弦分支长度变换下近似系统发育似然可以大大降低速率模型选择的计算成本,而不会降低准确性。我们在灵长类动物(372 种)和开花植物(644 种)的两个大型系统发育上测试了近似程序,使用确切的可能性复制了在较小数据集上获得的结果。我们的发现和方法可以帮助用户选择最佳速率模型,以估计生命之树沿线的时间和速率。
更新日期:2024-11-21
中文翻译:
松弛时钟模型的贝叶斯选择:区分独立速率和自相关速率
在物种差异的贝叶斯分子钟测年中,速率模型用于构建系统发育中分支的分子进化速率的先验,通常使用独立和自相关速率模型。然而,这两类模型可能导致同一数据集的发散时间估计明显不同,因此选择最佳速率模型对于获得可靠的发散时间参考似乎很重要。然而,贝叶斯速率模型选择的特性并不十分清楚,特别是当分析的序列分区数量增加以及年龄校准(例如化石校准)指定错误时。此外,贝叶斯速率模型选择在计算上很昂贵,因为它需要通过 MCMC 采样来计算边际似然,因此需要能够在不影响其准确性的情况下加快模型选择过程的方法。在这项研究中,我们结合使用计算机模拟和真实数据分析来研究贝叶斯速率模型选择的静态行为,我们还探索了在大型系统发育数据集中提高计算效率的可能性的近似值。我们的模拟表明,随着分析更多的分子序列分区并且没有使用校准,正确速率模型的后验概率收敛为 1,正如渐近贝叶斯模型选择理论所预期的那样。此外,我们还表明模型选择程序对校准的轻微错误指定是稳健的,并且在这种情况下可以可靠地推断正确的速率模型。 然而,我们表明,当校准被严重错误指定时,计算的模型概率是完全错误的,并且可能会收敛到错误的速率模型。最后,我们证明,在反正弦分支长度变换下近似系统发育似然可以大大降低速率模型选择的计算成本,而不会降低准确性。我们在灵长类动物(372 种)和开花植物(644 种)的两个大型系统发育上测试了近似程序,使用确切的可能性复制了在较小数据集上获得的结果。我们的发现和方法可以帮助用户选择最佳速率模型,以估计生命之树沿线的时间和速率。