当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation
Genome Research ( IF 6.2 ) Pub Date : 2024-11-01 , DOI: 10.1101/gr.279273.124
Jonas A. Gustafson, Sophia B. Gibson, Nikhita Damaraju, Miranda P.G. Zalusky, Kendra Hoekzema, David Twesigomwe, Lei Yang, Anthony A. Snead, Phillip A. Richmond, Wouter De Coster, Nathan D. Olson, Andrea Guarracino, Qiuhui Li, Angela L. Miller, Joy Goffena, Zachary B. Anderson, Sophie H.R. Storz, Sydney A. Ward, Maisha Sinha, Claudia Gonzaga-Jauregui, Wayne E. Clarke, Anna O. Basile, André Corvelo, Catherine Reeves, Adrienne Helland, Rajeeva Lochan Musunuri, Mahler Revsine, Karynne E. Patterson, Cate R. Paschal, Christina Zakarian, Sara Goodwin, Tanner D. Jensen, Esther Robb, The 1000 Genomes ONT Sequencing Consortium, University of Washington Center for Rare Disease Research (UW-CRDR), Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium, William Richard McCombie, Fritz J. Sedlazeck, Justin M. Zook, Stephen B. Montgomery, Erik Garrison, Mikhail Kolmogorov, Michael C. Schatz, Richard N. McLaughlin, Jr., Harriet Dashnow, Michael C. Zody, Matt Loose, Miten Jain, Evan E. Eichler, Danny E. Miller

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

中文翻译:


对 1000 Genomes Project 的样本进行高覆盖度纳米孔测序,以构建人类遗传变异的全面目录



在全面的临床基因检测后,只有不到一半的疑似孟德尔或单基因疾病的个体得到精确的分子诊断。数据质量和成本的提高提高了人们对使用长读长测序 (LRS) 来简化临床基因组检测的兴趣,但缺乏用于变异过滤和优先级排序的对照数据集使得 LRS 数据的三级分析具有挑战性。为了解决这个问题,1000 基因组计划 (1KGP) 牛津纳米孔技术测序联盟旨在从至少 800 个 1KGP 样本中生成 LRS 数据。我们的目标是使用 LRS 来识别更广泛的变异范围,以便我们可以提高对人类变异正常模式的理解。在这里,我们展示了前 100 个样本的分析数据,代表所有 5 个超级群体和 19 个亚群体。这些样品测序的平均覆盖深度为 37× 序列读取 N50 为 54 kbp,与之前鉴定均聚物区域之外的单核苷酸和 indel 变体的研究高度一致。使用多个结构变异 (SV) 检出者,我们为每个基因组平均确定了 24,543 个高置信度 SV,包括可能破坏基因功能的共享和私有 SV,以及使用短读长未检测到的疾病相关重复序列中的致病性扩增。甲基化特征的评估揭示了已知印记位点的预期模式、具有偏斜 X 失活模式的样本和新的差异甲基化区域。所有原始测序数据、处理后的数据和汇总统计数据都是公开的,为临床遗传学界发现致病性 SV 提供了宝贵的资源。
更新日期:2024-11-01
down
wechat
bug