Nature Genetics ( IF 31.7 ) Pub Date : 2024-10-10 , DOI: 10.1038/s41588-024-01954-w Wei Li
Pangenome graphs are commonly used as references for read mapping. To improve the mapping accuracy, Sirén et al. proposed a k-mer-based approach for sampling haplotypes that were subsequently used to build a personalized subgraph. The original pangenome graph was partitioned into nonoverlapping blocks, and the local haplotypes were labeled with graph-unique k-mers. Based on the k-mer counts in the reads, the authors were able to classify k-mers in the matrices as present (heterozygous or homozygous) or absent, and select relevant haplotypes in each block accordingly. The sampled haplotypes led to construction of a personalized variation graph, which is actually a subgraph of the original graph. The haplotype sampling approach is available as part of the vg toolkit and applied to pangenome graphs from the Human Pangenome Reference Consortium. Compared with a frequency-filtered graph, the personalized subgraph with k-mer-based haplotype sampling is a superior reference for read mapping. It reduces genotyping errors and improves the accuracy in calling small variants and genotyping structural variants, suggesting future directions of optimizing methods for personalizing pangenome references.
Original reference: Nat. Methods https://doi.org/10.1038/s41592-024-02407-2 (2024)
中文翻译:
使用 k-mer 个性化泛基因组图
泛基因组图通常用作读取映射的参考。为了提高映射的准确性,Sirén 等人提出了一种基于 k-mer 的单倍型采样方法,随后用于构建个性化子图。原始泛基因组图被划分为不重叠的块,局部单倍型用图独特的 k-mer 标记。根据读数中的 k-mer 计数,作者能够将矩阵中的 k-mer 分类为存在(杂合子或纯合子)或不存在,并相应地在每个块中选择相关的单倍型。采样的单倍型导致了个性化变化图的构建,它实际上是原始图的子图。单倍型采样方法作为 vg 工具包的一部分提供,并应用于人类泛基因组参考联盟的泛基因组图。与频率过滤图相比,具有基于 k-mer 的单倍型采样的个性化子图是读取映射的优秀参考。它减少了基因分型错误,提高了检出小变异和基因分型结构变异的准确性,为个性化泛基因组参考的优化方法提出了未来方向。
Original reference: Nat. Methods https://doi.org/10.1038/s41592-024-02407-2 (2024)