当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Assessing the Adequacy of Morphological Models using Posterior Predictive Simulations
Systematic Biology ( IF 6.1 ) Pub Date : 2024-10-07 , DOI: 10.1093/sysbio/syae055 Laura P A Mulvey, Michael R May, Jeremy M Brown, Sebastian Höhna, April M Wright, Rachel C M Warnock
Systematic Biology ( IF 6.1 ) Pub Date : 2024-10-07 , DOI: 10.1093/sysbio/syae055 Laura P A Mulvey, Michael R May, Jeremy M Brown, Sebastian Höhna, April M Wright, Rachel C M Warnock
Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod data sets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple data sets, indicating that there is no ‘one size fits all’ when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.
中文翻译:
使用后验预测模拟评估形态模型的充分性
重建不同生物群的进化历史有助于深入了解地球上生命是如何起源和多样化的。系统发育树通常用于估计这种进化历史。在贝叶斯系统发育学中,估计树的一个重要步骤是选择合适的特征进化模型。虽然最常用的特征数据是分子序列数据,但形态学数据仍然是重要的信息来源。形态学特征的使用允许纳入化石分类群,尽管分子测序取得了进展,但仍在新生儿学中继续发挥着重要作用。此外,它是主要数据源,使我们能够将已灭绝和现存的分类群直接统一在同一生成过程中。因此,我们需要合适的形态特征进化模型,最常见的是 Mk Lewis 模型。虽然它经常用于古生物学和新生儿学,但尚不清楚简单的 Mk 替换模型或其任何扩展是否对形态进化过程提供了足够好的描述。在这项研究中,我们调查了不同形态学模型对经验四足动物数据集的影响。具体来说,我们将未分区的 Mk 模型与那些按观察到的状态数量对字符进行分区的模型进行了比较,既允许又不允许跨站点的速率变化,并考虑了确定偏差。我们表明,替代模型的选择对拓扑和分支长度都有影响,突出了模型选择的重要性。通过模拟,我们验证了模型充分性方法(后验预测模拟)在选择合适的模型方面的使用。 此外,我们将模型充分性的性能与贝叶斯模型选择进行了比较。我们演示了基于边际似然的模型选择方法如何不适合在字符状态空间变化的分区方案(即 Q 矩阵状态大小变化)的模型之间进行选择。使用后验预测模拟,我们发现 Mk 模型的当前变体通常在捕获生成数据的进化动力学方面表现良好。我们没有发现跨多个数据集的特定模型扩展有任何偏好,这表明在形态学数据方面没有 “一刀切” ,应该仔细考虑选择离散特征进化的模型。通过使用合适的特征进化模型,我们可以增加对系统发育估计的信心,这反过来应该使我们能够更准确地了解已灭绝和现存分类群的进化历史。
更新日期:2024-10-07
中文翻译:
使用后验预测模拟评估形态模型的充分性
重建不同生物群的进化历史有助于深入了解地球上生命是如何起源和多样化的。系统发育树通常用于估计这种进化历史。在贝叶斯系统发育学中,估计树的一个重要步骤是选择合适的特征进化模型。虽然最常用的特征数据是分子序列数据,但形态学数据仍然是重要的信息来源。形态学特征的使用允许纳入化石分类群,尽管分子测序取得了进展,但仍在新生儿学中继续发挥着重要作用。此外,它是主要数据源,使我们能够将已灭绝和现存的分类群直接统一在同一生成过程中。因此,我们需要合适的形态特征进化模型,最常见的是 Mk Lewis 模型。虽然它经常用于古生物学和新生儿学,但尚不清楚简单的 Mk 替换模型或其任何扩展是否对形态进化过程提供了足够好的描述。在这项研究中,我们调查了不同形态学模型对经验四足动物数据集的影响。具体来说,我们将未分区的 Mk 模型与那些按观察到的状态数量对字符进行分区的模型进行了比较,既允许又不允许跨站点的速率变化,并考虑了确定偏差。我们表明,替代模型的选择对拓扑和分支长度都有影响,突出了模型选择的重要性。通过模拟,我们验证了模型充分性方法(后验预测模拟)在选择合适的模型方面的使用。 此外,我们将模型充分性的性能与贝叶斯模型选择进行了比较。我们演示了基于边际似然的模型选择方法如何不适合在字符状态空间变化的分区方案(即 Q 矩阵状态大小变化)的模型之间进行选择。使用后验预测模拟,我们发现 Mk 模型的当前变体通常在捕获生成数据的进化动力学方面表现良好。我们没有发现跨多个数据集的特定模型扩展有任何偏好,这表明在形态学数据方面没有 “一刀切” ,应该仔细考虑选择离散特征进化的模型。通过使用合适的特征进化模型,我们可以增加对系统发育估计的信心,这反过来应该使我们能够更准确地了解已灭绝和现存分类群的进化历史。