当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Phylogenetic tree instability after taxon addition: empirical frequency, predictability, and consequences for online inference
Systematic Biology ( IF 6.1 ) Pub Date : 2024-10-25 , DOI: 10.1093/sysbio/syae059 Lena Collienne, Mary Barker, Marc A Suchard, Frederick A Matsen IV
Systematic Biology ( IF 6.1 ) Pub Date : 2024-10-25 , DOI: 10.1093/sysbio/syae059 Lena Collienne, Mary Barker, Marc A Suchard, Frederick A Matsen IV
Online phylogenetic inference methods add sequentially arriving sequences to an inferred phylogeny without the need to recompute the entire tree from scratch. Some online method implementations exist already, but there remains concern that additional sequences may change the topological relationship among the original set of taxa. We call such a change in tree topology a lack of stability for the inferred tree. In this paper, we analyze the stability of single taxon addition in a Maximum Likelihood framework across 1, 000 empirical datasets. We find that instability occurs in almost 90% of our examples, although observed topological differences do not always reach significance under the AU-test. Changes in tree topology after addition of a taxon rarely occur close to its attachment location, and are more frequently observed in more distant tree locations carrying low bootstrap support. To investigate whether instability is predictable, we hypothesize sources of instability and design summary statistics addressing these hypotheses. Using these summary statistics as input features for machine learning under random forests, we are able to predict instability and can identify the most influential features. In summary, it does not appear that a strict insertion-only online inference method will deliver globally optimal trees, although relaxing insertion strictness by allowing for a small number of final tree rearrangements or accepting slightly suboptimal solutions appears feasible.
中文翻译:
分类单元添加后系统发育树的不稳定性:在线推理的经验频率、可预测性和后果
在线系统发育推理方法将顺序到达的序列添加到推断的系统发育中,而无需从头开始重新计算整个树。一些在线方法实现已经存在,但仍然担心额外的序列可能会改变原始分类群之间的拓扑关系。我们将树拓扑的这种变化称为推断树缺乏稳定性。在本文中,我们分析了 1, 000 个实证数据集中最大似然框架中单个分类单元添加的稳定性。我们发现几乎 90% 的例子都存在不稳定性,尽管在 AU 检验下观察到的拓扑差异并不总是达到显着性。添加分类单元后树拓扑的变化很少发生在其附着位置附近,并且在具有低引导支持的更远的树位置中更频繁地观察到。为了调查不稳定性是否可预测,我们假设了不稳定性的来源,并设计了针对这些假设的汇总统计。使用这些汇总统计数据作为随机森林下机器学习的输入特征,我们能够预测不稳定性并识别出最具影响力的特征。总之,严格的仅插入在线推理方法似乎不会提供全局最优树,尽管通过允许少量最终树重新排列或接受略微次优的解决方案来放松插入严格性似乎是可行的。
更新日期:2024-10-25
中文翻译:
分类单元添加后系统发育树的不稳定性:在线推理的经验频率、可预测性和后果
在线系统发育推理方法将顺序到达的序列添加到推断的系统发育中,而无需从头开始重新计算整个树。一些在线方法实现已经存在,但仍然担心额外的序列可能会改变原始分类群之间的拓扑关系。我们将树拓扑的这种变化称为推断树缺乏稳定性。在本文中,我们分析了 1, 000 个实证数据集中最大似然框架中单个分类单元添加的稳定性。我们发现几乎 90% 的例子都存在不稳定性,尽管在 AU 检验下观察到的拓扑差异并不总是达到显着性。添加分类单元后树拓扑的变化很少发生在其附着位置附近,并且在具有低引导支持的更远的树位置中更频繁地观察到。为了调查不稳定性是否可预测,我们假设了不稳定性的来源,并设计了针对这些假设的汇总统计。使用这些汇总统计数据作为随机森林下机器学习的输入特征,我们能够预测不稳定性并识别出最具影响力的特征。总之,严格的仅插入在线推理方法似乎不会提供全局最优树,尽管通过允许少量最终树重新排列或接受略微次优的解决方案来放松插入严格性似乎是可行的。