当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
The Effect of Copy Number Hemiplasy on Gene Family Evolution
Systematic Biology ( IF 6.1 ) Pub Date : 2024-02-08 , DOI: 10.1093/sysbio/syae007 Qiuyi Li 1, 2 , Yao-Ban Chan 1 , Nicolas Galtier 3 , Celine Scornavacca 4
Systematic Biology ( IF 6.1 ) Pub Date : 2024-02-08 , DOI: 10.1093/sysbio/syae007 Qiuyi Li 1, 2 , Yao-Ban Chan 1 , Nicolas Galtier 3 , Celine Scornavacca 4
Affiliation
The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss (DTL), and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempt to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models — MLMSC (MultiLocus MultiSpecies Coalescent), which models CNH, and DLCoal (Duplication, Loss, and Coalescence), which does not — approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.
中文翻译:
拷贝数半变性对基因家族进化的影响
基因家族的进化是复杂的,涉及基因水平的进化事件,如基因复制、水平基因转移和基因丢失(DTL),以及其他过程,如不完全谱系排序(ILS)。因此,基因树和物种树之间经常存在拓扑差异。最近开发了许多模型来解释这些差异,其中最现实的模型试图同时考虑基因水平事件和 ILS。当统一在单一模型中时,ILS 和基因水平事件之间的相互作用会导致基因拷贝数多态性,我们将其称为拷贝数半变性(CNH)。在本文中,我们将 Wright-Fisher 过程扩展为包括多个物种的重复和丢失,并表明该过程的 CNH 概率可能很大。我们研究了两个统一模型——对 CNH 进行建模的 MLMSC(多位点多物种聚结)和对 CNH 进行建模的 DLCoal(复制、丢失和聚结)——对重复和丢失的赖特-费希尔过程的近似程度。然后,我们通过比较 MLMSC 和 DLCoal 研究 CNH 对基因家族进化的影响。我们在两种模型下生成了可比较的基因树,在各种汇总统计数据中显示出显着差异;最重要的是,CNH大大减少了基因拷贝数。如果不考虑这一点,估计重复率的传统方法(通过计算基因拷贝数)就会变得不准确。模拟基因树还用于通过汇总方法 ASTRAL 和 ASTRAL-Pro 进行物种树推断,这表明它们的准确性(基于根据真实数据校准的 CNH 不知情模拟)可能被高估了。
更新日期:2024-02-08
中文翻译:
拷贝数半变性对基因家族进化的影响
基因家族的进化是复杂的,涉及基因水平的进化事件,如基因复制、水平基因转移和基因丢失(DTL),以及其他过程,如不完全谱系排序(ILS)。因此,基因树和物种树之间经常存在拓扑差异。最近开发了许多模型来解释这些差异,其中最现实的模型试图同时考虑基因水平事件和 ILS。当统一在单一模型中时,ILS 和基因水平事件之间的相互作用会导致基因拷贝数多态性,我们将其称为拷贝数半变性(CNH)。在本文中,我们将 Wright-Fisher 过程扩展为包括多个物种的重复和丢失,并表明该过程的 CNH 概率可能很大。我们研究了两个统一模型——对 CNH 进行建模的 MLMSC(多位点多物种聚结)和对 CNH 进行建模的 DLCoal(复制、丢失和聚结)——对重复和丢失的赖特-费希尔过程的近似程度。然后,我们通过比较 MLMSC 和 DLCoal 研究 CNH 对基因家族进化的影响。我们在两种模型下生成了可比较的基因树,在各种汇总统计数据中显示出显着差异;最重要的是,CNH大大减少了基因拷贝数。如果不考虑这一点,估计重复率的传统方法(通过计算基因拷贝数)就会变得不准确。模拟基因树还用于通过汇总方法 ASTRAL 和 ASTRAL-Pro 进行物种树推断,这表明它们的准确性(基于根据真实数据校准的 CNH 不知情模拟)可能被高估了。