当前位置:
X-MOL 学术
›
Metab. Eng.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes
Metabolic Engineering ( IF 6.8 ) Pub Date : 2024-11-20 , DOI: 10.1016/j.ymben.2024.11.007 Jaehyung Kim, Jihoon Woo, Joon Young Park, Kyung-Jin Kim, Donghyuk Kim
Metabolic Engineering ( IF 6.8 ) Pub Date : 2024-11-20 , DOI: 10.1016/j.ymben.2024.11.007 Jaehyung Kim, Jihoon Woo, Joon Young Park, Kyung-Jin Kim, Donghyuk Kim
Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.
中文翻译:
使用酶中的变压器注意力分析进行 NAD/NADP 辅因子预测和工程的深度学习
了解和操纵 NAD(P) 依赖性氧化还原酶(自然界中分布最广泛的酶组)的辅因子偏好在生物工程中越来越重要。然而,辅因子偏好的大规模鉴定和转换辅因子特异性的突变体设计仍然是复杂的任务。在这里,我们介绍了 DISCODE(基于深度学习的迭代管道,用于分析 COfactors 的特异性并设计酶),这是一种基于 transformer 的新型深度学习模型,用于预测 NAD(P) 辅因子偏好。对于模型训练,总共收集了 7,132 个 NAD(P) 依赖性酶序列。利用全长序列信息,DISCODE 对 NAD(P) 依赖性氧化还原酶蛋白序列的辅因子偏好进行分类,而不受结构或分类学限制。该模型分别显示了 97.4% 和 97.3% 的准确率和 F1 分数。DISCODE 的一个显着特点是其转换器层的可解释性。对模型中的注意力层进行分析,可以识别出几个显示出明显更高注意力权重的残基。它们与与 NAD(P) 密切相关的结构重要残基对齐,有助于鉴定关键残基以确定辅因子特异性。这些关键残基与经过验证的辅因子转换突变体显示出高度一致性。DISCODE 集成到酶设计流程中,与注意力分析相结合,实现了重新设计辅因子特异性的全自动方法。
更新日期:2024-11-20
中文翻译:
使用酶中的变压器注意力分析进行 NAD/NADP 辅因子预测和工程的深度学习
了解和操纵 NAD(P) 依赖性氧化还原酶(自然界中分布最广泛的酶组)的辅因子偏好在生物工程中越来越重要。然而,辅因子偏好的大规模鉴定和转换辅因子特异性的突变体设计仍然是复杂的任务。在这里,我们介绍了 DISCODE(基于深度学习的迭代管道,用于分析 COfactors 的特异性并设计酶),这是一种基于 transformer 的新型深度学习模型,用于预测 NAD(P) 辅因子偏好。对于模型训练,总共收集了 7,132 个 NAD(P) 依赖性酶序列。利用全长序列信息,DISCODE 对 NAD(P) 依赖性氧化还原酶蛋白序列的辅因子偏好进行分类,而不受结构或分类学限制。该模型分别显示了 97.4% 和 97.3% 的准确率和 F1 分数。DISCODE 的一个显着特点是其转换器层的可解释性。对模型中的注意力层进行分析,可以识别出几个显示出明显更高注意力权重的残基。它们与与 NAD(P) 密切相关的结构重要残基对齐,有助于鉴定关键残基以确定辅因子特异性。这些关键残基与经过验证的辅因子转换突变体显示出高度一致性。DISCODE 集成到酶设计流程中,与注意力分析相结合,实现了重新设计辅因子特异性的全自动方法。