Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics,Genome Research

当前位置： X-MOL 学术 › Genome Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
Genome Research ( IF 6.2 ) Pub Date : 2019-01-01 , DOI: 10.1101/gr.235499.118
David C Danko _{1,

2} , Dmitry Meleshko _{1,

2} , Daniela Bezdan ₂ , Christopher Mason _{2,

3} , Iman Hajirasouliha _{2,

4}

Affiliation

Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have revived interest in short-read technology as a viable approach to understand large-scale structures in genomes and metagenomes. Linked-Read technologies, such as the 10x Chromium system, use a microfluidic system and a specialized set of 3′ barcodes (aka UIDs) to tag short DNA reads sourced from the same long fragment of DNA; subsequently, the tagged reads are sequenced on standard short-read platforms. This approach results in interesting compromises. Each long fragment of DNA is only sparsely covered by reads, no information about the ordering of reads from the same fragment is preserved, and 3′ barcodes match reads from roughly 2–20 long fragments of DNA. However, compared to long-read technologies, the cost per base to sequence is far lower, far less input DNA is required, and the per base error rate is that of Illumina short-reads. In this paper, we formally describe a particular algorithmic issue common to Linked-Read technology: the deconvolution of reads with a single 3′ barcode into clusters that represent single long fragments of DNA. We introduce Minerva, a graph-based algorithm that approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we develop two demonstrations where the deconvolution of barcoded reads improves downstream results, improving the specificity of taxonomic assignments and of k-mer-based clustering. To the best of our knowledge, we are the first to address the problem of barcode deconvolution in metagenomics.

中文翻译：

Minerva：一种用于宏基因组学去卷积链接读取的无比对和无参考方法

新兴的 Linked-Read 技术（又名 read cloud 或 barcoded short-reads）重新唤起了人们对短读技术的兴趣，将其作为了解基因组和宏基因组中大规模结构的可行方法。Linked-Read 技术，例如 10x Chromium 系统，使用微流体系统和一组专门的 3' 条形码（又名 UID）来标记来自相同长 DNA 片段的短 DNA 读数；随后，标记的读取在标准的短读取平台上进行排序。这种方法会导致有趣的妥协。每个长 DNA 片段仅被 reads 稀疏地覆盖，没有保留有关来自同一片段的 reads 顺序的信息，并且 3' 条形码匹配来自大约 2-20 个 DNA 长片段的 reads。然而，与长读长技术相比，测序每个碱基的成本要低得多，需要的输入 DNA 少得多，每个碱基的错误率是 Illumina 短读的错误率。在本文中，我们正式描述了 Linked-Read 技术常见的一个特定算法问题：将具有单个 3' 条形码的读取解卷积为代表单个长 DNA 片段的簇。我们介绍了 Minerva，这是一种基于图形的算法，可以近似解决宏基因组数据（参考基因组可能不完整或不可用）的条形码反卷积问题。此外，我们开发了两个演示，其中条形码读取的反卷积改进了下游结果，提高了分类分配和将具有单个 3' 条形码的读数解卷积为代表单个长 DNA 片段的簇。我们介绍了 Minerva，这是一种基于图形的算法，可以近似解决宏基因组数据（参考基因组可能不完整或不可用）的条形码反卷积问题。此外，我们开发了两个演示，其中条形码读取的反卷积改进了下游结果，提高了分类分配和将具有单个 3' 条形码的读数解卷积为代表单个长 DNA 片段的簇。我们介绍了 Minerva，这是一种基于图形的算法，可以近似解决宏基因组数据（参考基因组可能不完整或不可用）的条形码反卷积问题。此外，我们开发了两个演示，其中条形码读取的反卷积改进了下游结果，提高了分类分配和基于k -mer 的聚类。据我们所知，我们是第一个解决宏基因组学中条形码反卷积问题的人。

更新日期：2019-01-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>