当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reaction rebalancing: a novel approach to curating reaction databases
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-07-19 , DOI: 10.1186/s13321-024-00875-4
Tieu-Long Phan 1, 2 , Klaus Weinbauer 1, 3 , Thomas Gärtner 3 , Daniel Merkle 2, 4 , Jakob L Andersen 2 , Rolf Fagerberg 2 , Peter F Stadler 1, 5, 6, 7, 8, 9
Affiliation  

Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need. The SynRBL framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities. The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively. The SynRBL framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning. SynRBL features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, SynRBL successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, SynRBL achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.

中文翻译:


反应再平衡:一种管理反应数据库的新方法



反应数据库是计算化学和生物化学领域各种应用的关键资源,包括计算机辅助合成规划 (CASP) 和代谢网络的大规模分析。只有数据集准确且完整,才能充分发挥这些资源的潜力。然而,共反应物和副产物的缺失,即不平衡反应,是常见现象而不是例外。因此,迫切需要对这些不完整的条目进行整理和更正。 SynRBL 框架通过双重策略解决了这个问题:基于规则的非碳化合物方法,使用原子符号和计数进行预测,以及基于最大公共子图 (MCS) 的碳化合物技术,旨在对齐反应物以及推断缺失实体的产品。基于规则的方法的准确度超过 99%,而基于 MCS 的准确度从 81.19% 到 99.33% 不等,具体取决于反应特性。此外,还设计了适用性域和机器学习评分函数来量化预测置信度。该框架的整体功效通过其成功率和准确率指标来描述,成功率和准确率分别为 89.83% 至 99.75% 和 90.85% 至 99.05%。 SynRBL 框架为重新校准化学反应提供了一种新颖的解决方案,显着提高了反应的完整性。经过严格的验证,它在反应再平衡方面实现了突破性的准确性。这为未来的改进奠定了基础,特别是原子-原子映射技术以及自动合成规划等下游任务。 SynRBL 采用新颖的计算方法来纠正化学反应数据库中不平衡的条目。 通过结合推断非碳化合物的启发式规则和常见子图搜索来解决碳不平衡问题,SynRBL 成功解决了该问题的大多数实例,该问题影响了大多数大规模资源中的大多数数据。与其他解决方案相比,SynRBL 的成功率和准确率都得到了显着提高,并为此问题提供了第一个免费的开源解决方案。
更新日期:2024-07-20
down
wechat
bug