当前位置:
X-MOL 学术
›
Genome Res.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Gapless assembly of complete human and plant chromosomes using only nanopore sequencing
Genome Research ( IF 6.2 ) Pub Date : 2024-11-01 , DOI: 10.1101/gr.279334.124 Sergey Koren, Zhigui Bao, Andrea Guarracino, Shujun Ou, Sara Goodwin, Katharine M. Jenike, Julian Lucas, Brandy McNulty, Jimin Park, Mikko Rautiainen, Arang Rhie, Dick Roelofs, Harrie Schneiders, Ilse Vrijenhoek, Koen Nijbroek, Olle Nordesjo, Sergey Nurk, Mike Vella, Katherine R. Lawrence, Doreen Ware, Michael C. Schatz, Erik Garrison, Sanwen Huang, William Richard McCombie, Karen H. Miga, Alexander H.J. Wittenberg, Adam M. Phillippy
Genome Research ( IF 6.2 ) Pub Date : 2024-11-01 , DOI: 10.1101/gr.279334.124 Sergey Koren, Zhigui Bao, Andrea Guarracino, Shujun Ou, Sara Goodwin, Katharine M. Jenike, Julian Lucas, Brandy McNulty, Jimin Park, Mikko Rautiainen, Arang Rhie, Dick Roelofs, Harrie Schneiders, Ilse Vrijenhoek, Koen Nijbroek, Olle Nordesjo, Sergey Nurk, Mike Vella, Katherine R. Lawrence, Doreen Ware, Michael C. Schatz, Erik Garrison, Sanwen Huang, William Richard McCombie, Karen H. Miga, Alexander H.J. Wittenberg, Adam M. Phillippy
The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, “telomere-to-telomere” genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT “Duplex” sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used “Pore-C” chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for the reconstruction of complete genomes.
中文翻译:
仅使用纳米孔测序即可无缝组装完整的人类和植物染色体
超长 (UL) Oxford Nanopore Technologies (ONT) 测序读长与长而准确的 Pacific Bioscience (PacBio) 高保真 (HiFi) 读长相结合,使人类基因组的完成成为可能,并刺激了完成许多其他物种基因组的类似努力。然而,这种完整的“端粒到端粒”基因组组装方法依赖于多个测序平台,限制了其可访问性。ONT“双重”测序读长,其中读取 DNA 的两条链以提高质量,有望实现较高的每个碱基准确性。为了评估这种新的数据类型,我们为三个广泛研究的基因组生成了 ONT 双链数据:人类 HG002、Solanum lycopersicum Heinz 1706(番茄)和 Zea mays B73(玉米)。对于二倍体、杂合子 HG002 基因组,我们还使用了“Pore-C”染色质接触映射来完全定相单倍型。我们发现双重数据的准确性与 HiFi 测序相似,但读取长度长了几十个碱基,并且 Pore-C 数据与现有的二倍体组装算法兼容。这种读长和准确性的结合能够构建高质量的初始组装,然后可以使用 UL 读数进一步解析,最后使用 Pore-C 分相化为染色体规模的单倍型。所得组装体的碱基准确度超过 99.999% (Q50) 且具有近乎完美的连续性,大多数染色体组装为单个重叠群。我们得出结论,ONT 测序是 HiFi 测序的从头基因组组装的可行替代方案,并为完整基因组的重建提供了多运行单仪器解决方案。
更新日期:2024-11-01
中文翻译:
仅使用纳米孔测序即可无缝组装完整的人类和植物染色体
超长 (UL) Oxford Nanopore Technologies (ONT) 测序读长与长而准确的 Pacific Bioscience (PacBio) 高保真 (HiFi) 读长相结合,使人类基因组的完成成为可能,并刺激了完成许多其他物种基因组的类似努力。然而,这种完整的“端粒到端粒”基因组组装方法依赖于多个测序平台,限制了其可访问性。ONT“双重”测序读长,其中读取 DNA 的两条链以提高质量,有望实现较高的每个碱基准确性。为了评估这种新的数据类型,我们为三个广泛研究的基因组生成了 ONT 双链数据:人类 HG002、Solanum lycopersicum Heinz 1706(番茄)和 Zea mays B73(玉米)。对于二倍体、杂合子 HG002 基因组,我们还使用了“Pore-C”染色质接触映射来完全定相单倍型。我们发现双重数据的准确性与 HiFi 测序相似,但读取长度长了几十个碱基,并且 Pore-C 数据与现有的二倍体组装算法兼容。这种读长和准确性的结合能够构建高质量的初始组装,然后可以使用 UL 读数进一步解析,最后使用 Pore-C 分相化为染色体规模的单倍型。所得组装体的碱基准确度超过 99.999% (Q50) 且具有近乎完美的连续性,大多数染色体组装为单个重叠群。我们得出结论,ONT 测序是 HiFi 测序的从头基因组组装的可行替代方案,并为完整基因组的重建提供了多运行单仪器解决方案。