当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2025-01-07 , DOI: 10.1186/s13321-024-00944-8
Zishuo Zeng 1 , Jin Guo 1 , Jiao Jin 1 , Xiaozhou Luo 2
Affiliation  

Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast’s metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub ( https://github.com/zishuozeng/CLAIRE ). Scientific contribution This work employed contrastive learning for predicting enzymatic reaction’s EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.

中文翻译:


CLAIRE:一种基于对比学习的 EC 化学反应数预测器



预测化学反应的 EC 数可为计算机辅助合成计划提供高效的酶注释。然而,由于数据稀缺和阶级不平衡,传统的机器学习方法遇到了挑战。在这里,我们介绍了 CLAIRE (Contrastive Learning-based AnnotatIon for Reaction's EC),这是一个利用对比学习、基于预训练语言模型的反应嵌入和数据增强的新颖框架来解决这些限制。CLAIRE 实现了显著的性能改进,在测试集 (n = 18,816) 和来自酵母代谢模型的独立数据集 (n = 1040) 上的加权平均 F1 分数分别为 0.861 和 0.911。值得注意的是,CLAIRE 的性能分别比最先进的型号高出 3.65 倍和 1.18 倍。其高精度使 CLAIRE 成为逆合成规划、药物命运预测和合成生物学应用的有前途的工具。CLAIRE 在 GitHub ( https://github.com/zishuozeng/CLAIRE ) 上免费提供。科学贡献 这项工作采用对比学习来预测酶促反应的 EC 值,克服了数据稀缺和不平衡的挑战。新模型实现了最先进的性能,并可能有助于计算机辅助合成规划。
更新日期:2025-01-08
down
wechat
bug