Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis,Genome Research

当前位置： X-MOL 学术 › Genome Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis
Genome Research ( IF 6.2 ) Pub Date : 2024-10-01 , DOI: 10.1101/gr.279449.124
Romain Derelle, Johanna von Wachsmann, Tommi Mäklin, Joel Hellewell, Timothy Russell, Ajit Lalvani, Leonid Chindelevitch, Nicholas J. Croucher, Simon R. Harris, John A. Lees

Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split k-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.

中文翻译：

使用分离 k-mer 分析对爆发基因组数据进行无缝、快速和准确的分析

在病原体种群中观察到的序列变异可用于重要的公共卫生和进化基因组分析，尤其是爆发分析和传播重建。识别这种变异通常是通过将序列读数与参考基因组比对来实现的，但这种方法容易受到参考偏差的影响，需要仔细过滤称为基因型。需要能够处理如此数量的细菌基因组数据，提供快速结果的工具，但这些工具仍然简单，因此无需训练有素的生物信息学家、昂贵的数据分析以及大文件的长期存储和处理即可使用。在这里，我们介绍了分离 k-mer 分析（SKA2），这是一种支持无参考和基于参考的映射的方法，以使用测序读数或基因组组装快速准确地对细菌群体进行基因分型。SKA2 对于密切相关的样本非常准确，在爆发模拟中，与基于参考的方法相比，我们表现出更好的变体召回率，没有假阳性。SKA2 还可以准确地将变异映射到参考，并与重组检测方法一起使用，以快速重建垂直进化历史。SKA2 比同类方法快很多倍，可用于将新基因组添加到现有调用集中，无需重新分析整个集合即可连续使用。SKA2 具有固有的无参考偏差、高准确度和稳健的实施方式，有可能成为细菌基因分型的首选工具。SKA2 是用 Rust 实现的，可以作为开源软件免费提供。

更新日期：2024-10-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南