当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-Based Prediction of Redox Potentials via Introducing Chemical Features into the Transformer Architecture
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-11-08 , DOI: 10.1021/acs.jcim.4c01299
Zhan Si, Deguang Liu, Wan Nie, Jingjing Hu, Chen Wang, Tingting Jiang, Haizhu Yu, Yao Fu

Rapid and accurate prediction of basic physicochemical parameters of molecules will greatly accelerate the target-orientated design of novel reactions and materials but has been long challenging. Herein, a chemical language model-based deep learning method, TransChem, has been developed for the prediction of redox potentials of organic molecules. Embedding an effective molecular characterization (combining spatial and electronic features), a nonlinear molecular messaging approach (Mol-Attention), and a perturbation learning method, TransChem, shows high accuracy in predicting the redox potential of organic radicals comprising over 100,000 data (R2 > 0.97, MAE <0.09 V) and is generalized to the smaller 2,1,3-benzothiadiazole data set (<3000 data points) and electron affinity data set (660 data) with low MAE of 0.07 V and 0.18 eV, respectively. In this context, a self-developed data set, i.e., the oxidation potential (OP) of a full-space disubstituted phenol data set (OPP-data set, total set: 74,529), has been predicted by TransChem with a high-throughput, and active learning strategy. The rapid and reliable prediction of OP could hopefully accelerate the screening of plausible reagents in highly selective cross-coupling of phenol derivatives. This study presents an important attempt to guide language modeling with chemical knowledge, while TransChem demonstrates state-of-the-art (SOTA) predictive performance on redox potential prediction benchmark data sets for its better understanding of molecular design and conformational relationships.

中文翻译:


通过在变压器架构中引入化学特性来预测基于数据的氧化还原电位



快速准确地预测分子的基本物理化学参数将大大加速新型反应和材料的靶向导向设计,但长期以来一直具有挑战性。在此,开发了一种基于化学语言模型的深度学习方法 TransChem,用于预测有机分子的氧化还原电位。嵌入有效的分子表征(结合空间和电子特征)、非线性分子信息传递方法 (Mol-Attention) 和扰动学习方法 TransChem 在预测有机自由基的氧化还原电位方面显示出很高的准确性,包括超过 100,000 个数据(R2 > 0.97,MAE <0.09 V),并推广到较小的 2,1,3-苯并噻二唑数据集(<3000 个数据点)和电子亲和数据集(660 个数据),MAE 低分别为 0.07 V 和 0.18 eV。在此背景下,TransChem 通过高通量和主动学习策略预测了自主开发的数据集,即全空间二取代苯酚数据集(OPP 数据集,总数:74,529)的氧化电位 (OP)。OP 的快速可靠预测有望加速在苯酚衍生物的高选择叉偶联中筛选合理的试剂。本研究提出了一个重要的尝试,即用化学知识指导语言建模,而 TransChem 在氧化还原电位预测基准数据集上展示了最先进的 (SOTA) 预测性能,以更好地理解分子设计和构象关系。
更新日期:2024-11-10
down
wechat
bug