当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Fundamental Role of Character Coding in Bayesian Morphological Phylogenetics
Systematic Biology ( IF 6.1 ) Pub Date : 2024-07-02 , DOI: 10.1093/sysbio/syae033
Basanta Khakurel 1, 2, 3 , Courtney Grigsby 1, 4 , Tyler D Tran 1 , Juned Zariwala 5 , Sebastian Höhna 2, 3 , April M Wright 1
Affiliation  

Phylogenetic trees establish a historical context for the study of organismal form and function. Most phylogenetic trees are estimated using a model of evolution. For molecular data, modeling evolution is often based on biochemical observations about changes between character states. For example, there are 4 nucleotides, and we can make assumptions about the probability of transitions between them. By contrast, for morphological characters, we may not know a priori how many characters states there are per character, as both extant sampling and the fossil record may be highly incomplete, which leads to an observer bias. For a given character, the state space may be larger than what has been observed in the sample of taxa collected by the researcher. In this case, how many evolutionary rates are needed to even describe transitions between morphological character states may not be clear, potentially leading to model misspecification. To explore the impact of this model misspecification, we simulated character data with varying numbers of character states per character. We then used the data to estimate phylogenetic trees using models of evolution with the correct number of character states and an incorrect number of character states. The results of this study indicate that this observer bias may lead to phylogenetic error, particularly in the branch lengths of trees. If the state space is wrongly assumed to be too large, then we underestimate the branch lengths, and the opposite occurs when the state space is wrongly assumed to be too small.

中文翻译:


字符编码在贝叶斯形态系统发育学中的基础作用



系统发育树为研究生物体的形式和功能建立了历史背景。大多数系统发育树是使用进化模型估计的。对于分子数据,模拟进化通常基于对性状状态之间变化的生化观察。例如,有 4 个核苷酸,我们可以对它们之间转换的概率做出假设。相比之下,对于形态特征,我们可能无法先验地知道每个特征有多少个字符,因为现存的样本和化石记录都可能非常不完整,这会导致观察者偏差。对于给定的字符,状态空间可能大于研究人员收集的分类群样本中观察到的空间。在这种情况下,甚至需要多少进化速率来描述形态特征状态之间的转变可能并不清楚,这可能导致模型错误指定。为了探索这种模型错误指定的影响,我们模拟了每个字符具有不同数量的字符状态的字符数据。然后,我们使用这些数据来估计系统发育树,使用进化模型,具有正确数量的字符状态和不正确的字符状态数量。本研究的结果表明,这种观察者偏差可能导致系统发育误差,尤其是在树木的分支长度方面。如果错误地假设状态空间太大,那么我们低估了分支长度,而当状态空间被错误地假设为太小时,情况恰恰相反。
更新日期:2024-07-02
down
wechat
bug