当前位置:
X-MOL 学术
›
Genome Res.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools
Genome Research ( IF 6.2 ) Pub Date : 2024-11-01 , DOI: 10.1101/gr.279095.124 Anupama Jha 1 , Stephanie C Bohaczuk 2 , Yizi Mao 2 , Jane Ranchalis 2 , Benjamin J Mallory 1 , Alan T Min 1 , Morgan O Hamm 1 , Elliott Swanson 1 , Danilo Dubocanin 3 , Connor Finkbeiner 1 , Tony Li 1 , Dale Whittington 1 , William Stafford Noble 1 , Andrew Ben Stergachis 4 , Mitchell R Vollger 5
Genome Research ( IF 6.2 ) Pub Date : 2024-11-01 , DOI: 10.1101/gr.279095.124 Anupama Jha 1 , Stephanie C Bohaczuk 2 , Yizi Mao 2 , Jane Ranchalis 2 , Benjamin J Mallory 1 , Alan T Min 1 , Morgan O Hamm 1 , Elliott Swanson 1 , Danilo Dubocanin 3 , Connor Finkbeiner 1 , Tony Li 1 , Dale Whittington 1 , William Stafford Noble 1 , Andrew Ben Stergachis 4 , Mitchell R Vollger 5
Affiliation
Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.
中文翻译:
DNA-m6A 检出以及使用 fibertools 进行长读长表观遗传学和遗传学综合分析
长读长 DNA 测序最近已成为以单分子和单核苷酸分辨率研究遗传和表观遗传结构的强大工具。长读长表观遗传学研究包括天然胞嘧啶甲基化的直接鉴定和外源放置的 DNA N6-甲基腺嘌呤 (DNA-m6A) 的鉴定。然而,使用单分子测序检测 DNA-m6A 修饰以及共处理单分子遗传和表观遗传结构受到计算需求和缺乏支持工具的限制。在这里,我们介绍了 fibertools,这是一个最先进的工具包,具有半监督卷积神经网络,可使用 Pacific Biosciences (PacBio) 单分子长读长测序快速准确地识别 m6A 标记的碱基,以及使用 PacBio 或 Oxford Nanopore Technologies (ONT) 测序平台生成的长读长遗传和表观遗传学数据的共处理。我们证明了对 >20 kb 长 DNA 分子的准确 DNA-m6A 鉴定(>90% 精度和召回率),速度提高了 ∼1000 倍。此外,我们证明 fibertools 可以很容易地以单分子分辨率整合遗传和表观遗传数据,包括分子和参考坐标系之间的无缝转换,从而允许在结构和体细胞可变基因组区域内对长读长数据进行准确的遗传和表观遗传分析。
更新日期:2024-11-01
中文翻译:
DNA-m6A 检出以及使用 fibertools 进行长读长表观遗传学和遗传学综合分析
长读长 DNA 测序最近已成为以单分子和单核苷酸分辨率研究遗传和表观遗传结构的强大工具。长读长表观遗传学研究包括天然胞嘧啶甲基化的直接鉴定和外源放置的 DNA N6-甲基腺嘌呤 (DNA-m6A) 的鉴定。然而,使用单分子测序检测 DNA-m6A 修饰以及共处理单分子遗传和表观遗传结构受到计算需求和缺乏支持工具的限制。在这里,我们介绍了 fibertools,这是一个最先进的工具包,具有半监督卷积神经网络,可使用 Pacific Biosciences (PacBio) 单分子长读长测序快速准确地识别 m6A 标记的碱基,以及使用 PacBio 或 Oxford Nanopore Technologies (ONT) 测序平台生成的长读长遗传和表观遗传学数据的共处理。我们证明了对 >20 kb 长 DNA 分子的准确 DNA-m6A 鉴定(>90% 精度和召回率),速度提高了 ∼1000 倍。此外,我们证明 fibertools 可以很容易地以单分子分辨率整合遗传和表观遗传数据,包括分子和参考坐标系之间的无缝转换,从而允许在结构和体细胞可变基因组区域内对长读长数据进行准确的遗传和表观遗传分析。