当前位置:
X-MOL 学术
›
Syst. Biol.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Benefits and Limits of Phasing Alleles for Network Inference of Allopolyploid Complexes
Systematic Biology ( IF 6.1 ) Pub Date : 2024-05-11 , DOI: 10.1093/sysbio/syae024 George P Tiley 1 , Andrew A Crowl 2 , Paul S Manos 2 , Emily B Sessa 3 , Claudia Solís-Lemus 4 , Anne D Yoder 2 , J Gordon Burleigh 3
Systematic Biology ( IF 6.1 ) Pub Date : 2024-05-11 , DOI: 10.1093/sysbio/syae024 George P Tiley 1 , Andrew A Crowl 2 , Paul S Manos 2 , Emily B Sessa 3 , Claudia Solís-Lemus 4 , Anne D Yoder 2 , J Gordon Burleigh 3
Affiliation
Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.
中文翻译:
定相等位基因用于同种异体多倍体复合物网络推理的优点和局限性
准确重建多倍体的网状历史仍然是理解植物进化的核心挑战。尽管系统发育网络可以提供对多倍体谱系之间关系的见解,但推断网络可能会受到多倍体分类群中同源性确定复杂性的阻碍。我们使用模拟表明,与单倍型共有序列或具有杂合碱基表示为歧义代码的序列相比,来自同种异体多倍体个体的定相等位基因可以通过获得具有较少基因座的真实网络来改善多物种合并下的系统发育网络推理。分阶段等位基因数据还可以改进网络的发散时间估计,这有助于评估同种异体多倍体物种形成假说和提出物种形成机制。为了在经验数据中实现这些结果,我们提出了一种新的管道,它利用最近开发的定相算法来可靠地对来自多倍体的等位基因进行定相。此管道特别适用于目标富集数据,其中覆盖深度通常足够高,可以对整个基因座进行分相。我们在北美 Dryopteris 蕨类植物复合体中提供了一个实证示例,展示了来自分阶段数据的见解以及网络推理的挑战。我们确定我们的管道 (PATÉ: Phased Alleles from Target Enrichment data) 能够从二倍体和多倍体中恢复高比例的分阶段基因座。与使用单倍型共有组装相比,这些数据可以通过准确推断基因流动的方向来改进网络估计,但系统发育网络的统计不可识别性对推断网状复合物的进化历史构成了障碍。
更新日期:2024-05-11
中文翻译:
定相等位基因用于同种异体多倍体复合物网络推理的优点和局限性
准确重建多倍体的网状历史仍然是理解植物进化的核心挑战。尽管系统发育网络可以提供对多倍体谱系之间关系的见解,但推断网络可能会受到多倍体分类群中同源性确定复杂性的阻碍。我们使用模拟表明,与单倍型共有序列或具有杂合碱基表示为歧义代码的序列相比,来自同种异体多倍体个体的定相等位基因可以通过获得具有较少基因座的真实网络来改善多物种合并下的系统发育网络推理。分阶段等位基因数据还可以改进网络的发散时间估计,这有助于评估同种异体多倍体物种形成假说和提出物种形成机制。为了在经验数据中实现这些结果,我们提出了一种新的管道,它利用最近开发的定相算法来可靠地对来自多倍体的等位基因进行定相。此管道特别适用于目标富集数据,其中覆盖深度通常足够高,可以对整个基因座进行分相。我们在北美 Dryopteris 蕨类植物复合体中提供了一个实证示例,展示了来自分阶段数据的见解以及网络推理的挑战。我们确定我们的管道 (PATÉ: Phased Alleles from Target Enrichment data) 能够从二倍体和多倍体中恢复高比例的分阶段基因座。与使用单倍型共有组装相比,这些数据可以通过准确推断基因流动的方向来改进网络估计,但系统发育网络的统计不可识别性对推断网状复合物的进化历史构成了障碍。