当前位置:
X-MOL 学术
›
J. Chem. Inf. Model.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Combining a Chemical Language Model and the Structure-Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-11-15 , DOI: 10.1021/acs.jcim.4c01781 Hengwei Chen,Jürgen Bajorath
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-11-15 , DOI: 10.1021/acs.jcim.4c01781 Hengwei Chen,Jürgen Bajorath
In medicinal chemistry, compound optimization relies on the generation of analogue series (AS) for exploring structure-activity relationships (SARs). Potency progression is a critical criterion for advancing AS. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of AS with potent compounds containing both core structure and substituent modifications at multiple sites, which has been reported for the first time. The approach combines a transformer chemical language model (CLM) with a SAR matrix (SARM) methodology that identifies and organizes structurally related AS. Therefore, the SARM approach was expanded to cover multisite AS. Consensus series extracted from SARMs representing a potency gradient served as input for CLM training to extend test AS with potent analogues. Different model variants were derived and investigated. Both general and fine-tuned models correctly predicted known potent analogues at high positions in probability-based compound rankings and chemically diversified AS through core structure modifications of the generated candidate compounds and substituent replacements at multiple sites.
中文翻译:
结合化学语言模型和构效关系矩阵形式,用于具有核心结构和取代基修饰的强效化合物的生成设计。
在药物化学中,化合物优化依赖于类似物序列 (AS) 的生成来探索构效关系 (SAR)。效价进展是推进 AS 的关键标准。在优化过程中,一个关键问题是接下来要合成哪些类似物。我们引入了一种新的计算方法,用于用在多个位点包含核心结构和取代基修饰的强效化合物扩展 AS,这是首次报道。该方法将 transformer 化学语言模型 (CLM) 与 SAR 矩阵 (SARM) 方法相结合,用于识别和组织结构相关的 AS。因此,SARM 方法扩展到涵盖多站点 AS。从代表效能梯度的 SARM 中提取的共识序列用作 CLM 训练的输入,以使用有效的类似物扩展测试 AS。推导并研究了不同的模型变体。通用模型和微调模型都通过生成的候选化合物的核心结构修饰和多个位点的取代基替换,正确预测了基于概率的化合物排名中高位置的已知有效类似物和化学多元化 AS。
更新日期:2024-11-15
中文翻译:
结合化学语言模型和构效关系矩阵形式,用于具有核心结构和取代基修饰的强效化合物的生成设计。
在药物化学中,化合物优化依赖于类似物序列 (AS) 的生成来探索构效关系 (SAR)。效价进展是推进 AS 的关键标准。在优化过程中,一个关键问题是接下来要合成哪些类似物。我们引入了一种新的计算方法,用于用在多个位点包含核心结构和取代基修饰的强效化合物扩展 AS,这是首次报道。该方法将 transformer 化学语言模型 (CLM) 与 SAR 矩阵 (SARM) 方法相结合,用于识别和组织结构相关的 AS。因此,SARM 方法扩展到涵盖多站点 AS。从代表效能梯度的 SARM 中提取的共识序列用作 CLM 训练的输入,以使用有效的类似物扩展测试 AS。推导并研究了不同的模型变体。通用模型和微调模型都通过生成的候选化合物的核心结构修饰和多个位点的取代基替换,正确预测了基于概率的化合物排名中高位置的已知有效类似物和化学多元化 AS。