当前位置: X-MOL 学术Genet. Sel. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy
Genetics Selection Evolution ( IF 3.6 ) Pub Date : 2024-01-12 , DOI: 10.1186/s12711-024-00875-w
David Wragg 1 , Wengang Zhang 1 , Sarah Peterson 2 , Murthy Yerramilli 2 , Richard Mellanby 1, 2 , Jeffrey J Schoenebeck 1 , Dylan N Clements 1
Affiliation  

Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses. DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths. We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (> 20X).

中文翻译:


关于单倍型准确性的低通测序和插补的警示故事



低通全基因组测序和插补可显着节省成本,从而大幅增加样本量和统计能力。这种方法在牲畜育种中特别有前途,它提供了一种负担得起的方法来筛选个体的有害等位基因或计算基因组育种值。因此,它在伴侣动物基因组学中也可能对支持谱系育种有价值。我们试图在狗中评估低覆盖率测序和参考引导插补对基因型一致性和关联分析的影响。从 30 只拉布拉多犬唾液中分离出的 DNA 以低覆盖度(0.9X 和 3.8X)和高覆盖度(43.5X)进行测序,并从 43.5X 下采样至 9.6X 和 17.4X。使用不同的参考组(1021 只狗)和前一组的两个子集(每组 256 只狗)进行基因型估算,其中一个子集的拉布拉多猎犬相对于其他品种过多。我们观察到参考组之间的估算基因型一致性几乎没有差异。使用单标记(GEMMA)和基于单倍型(XP-EHH)测试对作为疾病代理的基因座进行关联分析。 GEMMA 结果在 43.5X 和 ≥ 3.8X 覆盖深度之间高度相关 (r ≥ 0.97),而对于 0.9X,相关性较低 (r ≤ 0.8)。 XP-EHH 结果的相关性较差,r 范围为 0.58 (0.9X) 至 0.88 (17.4X)。在平均大小为 17 kb 的 10,000 个基因组区域的随机样本中,我们在测序深度上观察到每只狗的中位数为 3 个单倍型,其中 5% 的区域返回超过 8 个单倍型。对这样一个区域的检查揭示了不同测序深度的基因型和定相不一致。 我们证明了犬唾液中的 DNA 适用于全基因组测序,凸显了基于客户采样的可行性。低通测序和插补需要谨慎,因为当受试者拥有参考组中不存在的等位基因时,会导致不正确的等位基因分配。较大的面板具有更大的等位基因多样性的能力,这应该减少潜在的插补错误。尽管低通测序可以准确地估算等位基因剂量,但我们强调影响基于单倍型分析的定相准确性问题。因此,如果分析需要精确定相的基因型,我们主张进行高深度测序(> 20X)。
更新日期:2024-01-12
down
wechat
bug