当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MAST: Phylogenetic Inference with Mixtures Across Sites and Trees
Systematic Biology ( IF 6.1 ) Pub Date : 2024-02-27 , DOI: 10.1093/sysbio/syae008
Thomas K F Wong 1 , Caitlin Cherryh 2 , Allen G Rodrigo 3 , Matthew W Hahn 4 , Bui Quang Minh 1 , Robert Lanfear 2
Affiliation  

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting, introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call MAST. This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of incomplete lineage sorting in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of four Platyrrhine species for which standard concatenated maximum likelihood and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e. the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyse a concatenated alignment using maximum likelihood, while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

中文翻译:


MAST:跨站点和树的混合系统发育推断



现在,数百或数千个基因座已常规用于现代系统发育学研究。树推理的串联方法假设整个数据集有一个拓扑,但由于不完整的谱系排序、基因渗入和/或水平基因转移,不同的基因座可能具有不同的进化历史;由于重组,即使单个基因座也可能不是树状的。为了克服这个缺点,我们引入了一种多树混合模型的实现,我们称之为 MAST。该模型扩展了 Boussau 等人之前的实现。 (2009)允许用户估计单个对齐中一组预先指定的分叉树中每一个的权重。 MAST 模型允许每棵树拥有自己的重量、拓扑、分支长度、替换模型、核苷酸或氨基酸频率以及跨位点的速率异质性模型。我们在流行的系统发育软件 IQ-TREE 的最大似然框架中实现了 MAST 模型。模拟表明,我们可以在各种生物学现实场景下准确地恢复真实的模型参数,包括给定树拓扑集的分支长度和树权重。我们还表明,当在多棵树下模拟数据时,我们可以使用标准统计推断方法来拒绝单树模型(反之亦然)。我们将 MAST 模型应用于多个灵长类数据集,发现它可以恢复类人猿中不完整的谱系排序的信号,以及几个猕猴物种之间因基因渗入而导致的小树的不对称性。当应用于标准串联最大似然法和基因树方法不一致的四个鸭嘴兽物种的数据集时,我们观察到 MAST 给出了最高的权重(即 最大比例的位点)到树也受到基因树方法的支持。这些结果表明,MAST 模型能够使用最大似然分析串联对齐,同时避免假设只有一棵树所带来的一些偏差。我们讨论未来如何扩展 MAST 模型。
更新日期:2024-02-27
down
wechat
bug