当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inference of Phylogenetic Networks from Sequence Data using Composite Likelihood
Systematic Biology ( IF 6.1 ) Pub Date : 2024-10-10 , DOI: 10.1093/sysbio/syae054
Sungsik Kong, David L Swofford, Laura S Kubatko

While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.

中文翻译:


使用复合似然从序列数据推断系统发育网络



虽然系统发育对于理解物种如何进化至关重要,但它们并没有充分描述一些进化过程。例如,杂交是两个物种之间的杂交导致新物种形成的常见现象,必须由系统发育网络来描述,该系统发育网络通过允许两个分支合并为一个来修改系统发育树,从而导致网状结构。然而,随着数据集大小和/或拓扑复杂性的增加,现有的网络估计方法的计算成本会变得很高。缺乏可扩展推理方法阻碍了系统发育网络在实践中得到广泛使用,尽管积累的证据表明杂交在自然界中经常发生。在这里,我们提出了一种新方法,PhyNEST(使用 SiTe 模式的系统发育网络估计),该方法直接从序列数据中估计具有固定的、用户指定数量的网状的二进制 1 级系统发育网络。通过使用复合似然作为推理的基础,PhyNEST 能够以计算可处理的方式使用完整的基因组数据,无需在网络估计之前将数据总结为一组基因树。为了搜索网络空间,PhyNEST 实现了爬山和模拟退火算法。PhyNEST 假设数据由根据 Jukes-Cantor 替换模型演变的合并独立位点组成,并且网络具有恒定的有效种群大小。 仿真研究表明,PhyNEST 通常比两种现有的复合似然汇总方法(SNaQ 和 PhyloNet)更准确,并且它对至少一种形式的模型错误指定具有鲁棒性(假设核苷酸取代模型比真实生成模型更复杂)。我们应用 PhyNEST 重建了 Heliconius 蝴蝶和 Papionini 灵长类动物之间的进化关系,分别以杂交物种形成和广泛渗入为特征。PhyNEST 在开源 Julia 包中实现,并在 https://github.com/sungsik-kong/PhyNEST.jl 上公开提供。
更新日期:2024-10-10
down
wechat
bug