当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The influence of the number of tree searches on maximum likelihood inference in phylogenomics
Systematic Biology ( IF 6.1 ) Pub Date : 2024-06-28 , DOI: 10.1093/sysbio/syae031
Chao Liu 1, 2 , Xiaofan Zhou 3 , Yuanning Li 4, 5 , Chris Todd Hittinger 6 , Ronghui Pan 7 , Jinyan Huang 8 , Xue-Xin Chen 1 , Antonis Rokas 5 , Yun Chen 1 , Xing-Xing Shen 1, 2
Affiliation  

Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., ten) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥ 10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6 /15 phylogenomic datasets. Lastly, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.

中文翻译:


树木检索次数对系统发育学中最大似然推断的影响



最大似然 (ML) 系统发育推断广泛用于系统发育基因组学。由于启发式搜索最有可能找到次优树,因此建议在系统发育分析中进行多个(例如,十个)树搜索。然而,除了其积极作用之外,多棵树搜索如何以及在多大程度上帮助 ML 系统发育推断仍然没有得到充分探索。在这里,我们发现随机起始树在推断 ML 基因树方面不如 BioNJ 和简约起始树有效,并且 RAxML-NG 和 PhyML 对不同起始树的敏感性低于 IQ-TREE。然后,我们通过对来自 15 个动物、植物和真菌系统发育基因组数据集的 19,414 个基因比对进行 100 次树搜索,检查了树搜索次数对 IQ-TREE 和 RAxML-NG 的 ML 树推理的影响。我们发现,树搜索的数量对给定 ML 程序的 100 次搜索中 100 个最佳 ML 基因树拓扑的恢复产生了重大影响。此外,如果树搜索的次数≥ 10,则所有基于串联的树在拓扑上都是相同的。从 1 到 80 个树木搜索中推断出的基于四重奏的 ASTRAL 树与从 6 /15 系统发育数据集的 100 个树搜索中推断出的树在拓扑上不同。最后,我们的模拟表明,难度分数较低的基因比对更有可能找到 100 个最佳基因树拓扑,并且更有可能产生正确的树。
更新日期:2024-06-28
down
wechat
bug