当前位置:
X-MOL 学术
›
BMC Bioinform.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2006-07-19 , DOI: 10.1186/1471-2105-7-350
Alexander F Auch 1 , Stefan R Henz , Barbara R Holland , Markus Göker
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2006-07-19 , DOI: 10.1186/1471-2105-7-350
Alexander F Auch 1 , Stefan R Henz , Barbara R Holland , Markus Göker
Affiliation
BACKGROUND
Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called delta value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study.
RESULTS
Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. delta values are found to be a reliable predictor of phylogenetic accuracy.
CONCLUSION
Using the most treelike distance matrices, as judged by their delta values, distance methods are able to recover all major plant lineages, and are more in accordance with Apicomplexa organelles being derived from "green" plastids than from plastids of the "red" type. GBDP-like methods can be used to reliably infer phylogenies from different kinds of genomic data. A framework is established to further develop and improve such methods. delta values are a topology-independent tool of general use for the development and assessment of distance methods for phylogenetic inference.
中文翻译:
从整个质体和整个线粒体基因组序列推断的基因组 BLAST 距离系统发育。
背景技术不依赖于多序列比对的系统发育方法是直接从完全测序的基因组推断树的重要工具。在这里,我们扩展了最近描述的基因组 BLAST 距离系统发育 (GBDP) 策略,以从当前可用的所有完全测序的质体基因组和代表主要真核生物谱系的线粒体基因组的选择中计算系统发育树。BLASTN、TBLASTX 或两者的组合用于定位两个序列之间的高得分片段对 (HSP),从中以不同方式计算成对相似性和距离,从而产生总共 96 个 GBDP 变体。这些距离公式对系统发育重建的适用性是通过计算最近描述的“类树性”度量直接估计的,所谓的增量值,来自各自的距离矩阵。此外,我们将分别使用 UPGMA、NJ、BIONJ、FastME 或 STC 从这些矩阵推断出的树与正在研究的分类单元的 NCBI 分类树进行比较。结果 我们的结果表明,在这个分类学水平上,质体基因组对于推断系统发育比线粒体基因组更有价值,并且基于断点的距离几乎没有用。基于“匹配的”HSP 长度与平均基因组长度的比例的距离最适合树木估计。此外,我们发现使用 TBLASTX 而不是 BLASTN,特别是将 TBLASTX 和 BLASTN 结合使用会导致准确性的小幅但显着提高。其他因素不会显着影响系统发育结果。BIONJ 算法产生的系统发育最符合当前的 NCBI 分类法,NJ 和 FastME 的表现略差,如果应用于高质量的距离矩阵,STC 的表现也一样好。发现增量值是系统发育准确性的可靠预测因子。结论 使用最像树的距离矩阵,根据它们的 delta 值判断,距离方法能够恢复所有主要植物谱系,并且更符合源自“绿色”质体的顶端复合体细胞器而不是来自“红色”类型质体的细胞器. 类似 GBDP 的方法可用于从不同种类的基因组数据中可靠地推断系统发育。建立了一个框架来进一步开发和改进这些方法。
更新日期:2019-11-01
中文翻译:

从整个质体和整个线粒体基因组序列推断的基因组 BLAST 距离系统发育。
背景技术不依赖于多序列比对的系统发育方法是直接从完全测序的基因组推断树的重要工具。在这里,我们扩展了最近描述的基因组 BLAST 距离系统发育 (GBDP) 策略,以从当前可用的所有完全测序的质体基因组和代表主要真核生物谱系的线粒体基因组的选择中计算系统发育树。BLASTN、TBLASTX 或两者的组合用于定位两个序列之间的高得分片段对 (HSP),从中以不同方式计算成对相似性和距离,从而产生总共 96 个 GBDP 变体。这些距离公式对系统发育重建的适用性是通过计算最近描述的“类树性”度量直接估计的,所谓的增量值,来自各自的距离矩阵。此外,我们将分别使用 UPGMA、NJ、BIONJ、FastME 或 STC 从这些矩阵推断出的树与正在研究的分类单元的 NCBI 分类树进行比较。结果 我们的结果表明,在这个分类学水平上,质体基因组对于推断系统发育比线粒体基因组更有价值,并且基于断点的距离几乎没有用。基于“匹配的”HSP 长度与平均基因组长度的比例的距离最适合树木估计。此外,我们发现使用 TBLASTX 而不是 BLASTN,特别是将 TBLASTX 和 BLASTN 结合使用会导致准确性的小幅但显着提高。其他因素不会显着影响系统发育结果。BIONJ 算法产生的系统发育最符合当前的 NCBI 分类法,NJ 和 FastME 的表现略差,如果应用于高质量的距离矩阵,STC 的表现也一样好。发现增量值是系统发育准确性的可靠预测因子。结论 使用最像树的距离矩阵,根据它们的 delta 值判断,距离方法能够恢复所有主要植物谱系,并且更符合源自“绿色”质体的顶端复合体细胞器而不是来自“红色”类型质体的细胞器. 类似 GBDP 的方法可用于从不同种类的基因组数据中可靠地推断系统发育。建立了一个框架来进一步开发和改进这些方法。