Nature Communications ( IF 14.7 ) Pub Date : 2024-11-19 , DOI: 10.1038/s41467-024-54365-0 Wenhui Li, Xianyue Jiang, Wuke Wang, Liya Hou, Runze Cai, Yongqian Li, Qiuxi Gu, Qinchang Chen, Peixiang Ma, Jin Tang, Menghao Guo, Guohui Chuai, Xingxu Huang, Jun Zhang, Qi Liu
The discovery of CRISPR-Cas systems has paved the way for advanced gene editing tools. However, traditional Cas discovery methods relying on sequence similarity may miss distant homologs and aren’t suitable for functional recognition. With protein large language models (LLMs) evolving, there is potential for Cas system modeling without extensive training data. Here, we introduce CHOOSER (Cas HOmlog Observing and SElf-processing scReening), an AI framework for alignment-free discovery of CRISPR-Cas systems with self-processing pre-crRNA capability using protein foundation models. By using CHOOSER, we identify 11 Casλ homologs, nearly doubling the known catalog. Notably, one homolog, EphcCasλ, is experimentally validated for self-processing pre-crRNA, DNA cleavage, and trans-cleavage, showing promise for CRISPR-based pathogen detection. This study highlights an innovative approach for discovering CRISPR-Cas systems with specific functions, emphasizing their potential in gene editing.
中文翻译:
通过基础模型发现具有自加工前 crRNA 能力的 CRISPR-Cas 系统
CRISPR-Cas 系统的发现为先进的基因编辑工具铺平了道路。然而,依赖于序列相似性的传统 Cas 发现方法可能会错过远距离同源物,不适合功能识别。随着蛋白质大语言模型 (LLMs) 的发展,在没有大量训练数据的情况下,有可能进行 Cas 系统建模。在这里,我们介绍了 CHOOSER (Cas HOmlog Observing and SElf-processing scReening),这是一个 AI 框架,用于使用蛋白质基础模型对具有自加工前 crRNA 能力的 CRISPR-Cas 系统进行无比对发现。通过使用 CHOOSER,我们鉴定了 11 个 Casλ 同源物,几乎是已知目录的两倍。值得注意的是,一种同源物 EphcCasλ 在自加工前 crRNA、DNA 切割和反式切割方面进行了实验验证,显示出基于 CRISPR 的病原体检测的前景。本研究重点介绍了一种发现具有特定功能的 CRISPR-Cas 系统的创新方法,强调了它们在基因编辑中的潜力。