Nature Biotechnology ( IF 33.1 ) Pub Date : 2024-11-12 , DOI: 10.1038/s41587-024-02476-w Iris Marchal
RNA viruses are omnipresent and can infect a wide range of hosts. Recent efforts to sequence ecological samples have identified tens of thousands of new species, yet uncovering the complete spectrum of RNA virus diversity remains challenging. In a new paper published in Cell, Hou et al. describe a deep learning algorithm, called LucaProt, to advance RNA virus discovery at global scale. LucaProt is an AI framework that integrates sequence data with structural information to identify RNA-dependent RNA polymerase (RdRP) sequences, a standard component of RNA virus genomes. The authors analyzed 10,487 metatranscriptomes, identifying 161,979 species — including 70,458 that have not been described before — with some sequences so different from known RNA viruses that they could form 180 new supergroups. LucaProt outperformed available virus discovery tools that focus mostly on nucleotide sequences only, showing greater accuracy and efficiency.
The newly discovered RNA viruses are present in diverse global ecosystems and environments, including in air, and in extreme environments such as hot springs and hydrothermal vents. Most of the new viral supergroups were found in aquatic and sediment samples. This vast expansion of the pool of known viruses will not only boost our understanding of the virome and virus diversity, but might also identify previously unknown enzymes and proteins that could be useful in research or medicine.
中文翻译:
LucaProt 揭示了多样化的全球 RNA 病毒组
RNA 病毒无处不在,可以感染多种宿主。最近对生态样本进行测序的努力已经确定了数以万计的新物种,但揭示 RNA 病毒多样性的完整谱仍然具有挑战性。在发表在 Cell 上的一篇新论文中,Hou 等人描述了一种名为 LucaProt 的深度学习算法,该算法可在全球范围内推进 RNA 病毒发现。LucaProt 是一个 AI 框架,它将序列数据与结构信息集成在一起,以识别 RNA 依赖性 RNA 聚合酶 (RdRP) 序列,这是 RNA 病毒基因组的标准组成部分。作者分析了 10,487 个元转录组,确定了 161,979 个物种——包括 70,458 个以前从未描述过的物种——其中一些序列与已知的 RNA 病毒如此不同,以至于它们可以形成 180 个新的超群。LucaProt 的性能优于主要关注核苷酸序列的现有病毒发现工具,显示出更高的准确性和效率。
新发现的 RNA 病毒存在于不同的全球生态系统和环境中,包括空气中,以及温泉和热液喷口等极端环境中。大多数新的病毒超群是在水生和沉积物样本中发现的。已知病毒库的这种巨大扩展不仅会促进我们对病毒组和病毒多样性的理解,还可能识别出以前未知的酶和蛋白质,这些酶和蛋白质可能在研究或医学中有用。