当前位置: X-MOL 学术ACS Omega › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Chemical Graph-Based Transformer Models for Yield Prediction of High-Throughput Cross-Coupling Reaction Datasets
ACS Omega ( IF 3.7 ) Pub Date : 2024-09-17 , DOI: 10.1021/acsomega.4c06113
Akinori Sato, Ryosuke Asahara, Tomoyuki Miyao

The chemical reaction yield is an important factor to determine the reaction conditions. Recently, many data-driven models for yield prediction using high-throughput experimentation datasets have been reported. In this study, we propose a neural network architecture based on the chemical graphs of the reaction components to predict the reaction yield. The proposed model is the sequential combination of a message-passing neural network and a transformer encoder (MPNN-Transformer). The reaction components are converted to molecular matrices by the first network, followed by the interplay of the reaction components in the second network after adding the embeddings of the compound roles in the chemical reaction. The predictive ability of the proposed models was compared with state-of-the-art yield prediction models using two high-throughput experimental datasets: the Buchwald–Hartwig cross-coupling (BHC) and Suzuki–Miyaura cross-coupling (SMC) reaction datasets. Overall, the MPNN-Transformer models showed high prediction accuracy for the BHC reaction datasets and some of the extrapolation-oriented SMC reaction datasets. These models also performed well when the training dataset size was relatively large. Furthermore, analyzing the poorly predicted reactions for the BHC reaction dataset revealed a limitation of the data-driven yield prediction approach based on the chemical structural similarity.

中文翻译:


基于化学图的 Transformer 模型,用于高通量交叉偶联反应数据集的产量预测



化学反应产率是决定反应条件的重要因素。最近,已经报道了许多使用高通量实验数据集进行产量预测的数据驱动模型。在这项研究中,我们提出了一种基于反应组分的化学图的神经网络架构来预测反应产率。所提出的模型是消息传递神经网络和 transformer 编码器 (MPNN-Transformer) 的顺序组合。反应组分通过第一个网络转化为分子基质,然后在化学反应中添加化合物作用的嵌入后,第二个网络中的反应组分相互作用。使用两个高通量实验数据集:Buchwald-Hartwig 交叉偶联 (BHC) 和 Suzuki-Miyaura 交叉偶联 (SMC) 反应数据集,将所提出的模型的预测能力与最先进的产量预测模型进行了比较。总体而言,MPNN-Transformer 模型对 BHC 反应数据集和一些面向外推的 SMC 反应数据集显示出很高的预测准确性。当训练数据集大小相对较大时,这些模型也表现良好。此外,分析 BHC 反应数据集的预测不佳的反应揭示了基于化学结构相似性的数据驱动产率预测方法的局限性。
更新日期:2024-09-17
down
wechat
bug