当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines.
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-11-11 , DOI: 10.1021/acs.jcim.4c01246
Andrés Halabi Diaz,Mario Duque-Noreña,Elizabeth Rincón,Eduardo Chamorro

This study synergizes machine learning (ML) with conceptual density functional theory (CDFT) to develop OECD-compliant predictive models for the mutagenic activity of aromatic amines (AAs) with a fully No-Code methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation (LOOCV), and three distinct data splits. Our research employs the GFN2-xTB method, known for its robustness and speed, to compute descriptors for procarcinogens and their activated metabolites in vacuum and aqueous phases. We evaluate the effectiveness of different theoretical definitions of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes, and the newly introduced Log QP descriptor to approximate Log P information. SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable predictive models with highly robust internal validation (Avg. Correct Classifications = 76% and Avg. Kappa = 0.29) and external validation (Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics, and the results were compared to those of a two hidden layer Multilayer Perceptron. The results indicate that the second CDP definition for the electrophilicity in both vacuum and aqueous phases and also the newly presented Log QP descriptors are the most important ones for predicting the mutagenic activity of AA (namely ω+VacCDP2+, ω+AqCDP2+, and LogQP1+Vac, respectively). The results indicate that metabolic activation, aqueous solvent properties, and the CDP electrophilicity schemes and Log QP should be considered when building predictive models for the mutagenic activity of AA. This study offers a replicable, No-Code approach to QSAR research, making high-level ML and CDFT applications accessible to a broader audience. Future work will expand these methods to other compound families, enhancing predictive capabilities in the study of mutagenic activities and other biological phenomena.

中文翻译:


协同机器学习、概念密度泛函理论和生物化学:芳香胺致突变性的无代码可解释预测模型。



本研究将机器学习 (ML) 与概念密度泛函理论 (CDFT) 协同作用,使用包含 251 个 AA 的综合数据集、留一法出交叉验证 (LOOCV) 和三个不同的数据拆分,采用完全无代码方法为芳香胺 (AA) 的诱变活性开发符合 OECD 标准的预测模型。我们的研究采用以其稳健性和速度而闻名的 GFN2-xTB 方法来计算真空和水相中致癌物及其活化代谢物的描述符。我们评估了 CDFT 中不同亲电理论定义的有效性,即 PSL、GCV 和 CDP 方案,以及新引入的 Log QP 描述符以近似 Log P 信息。使用 SPAARC、RandomTree 和 JCHAID* ML 方法构建可解释的预测模型,这些模型具有高度稳健的内部验证(平均正确分类 = 76% 和平均 Kappa = 0.29)和外部验证(平均正确分类 = 79% 和平均 Kappa = 0.33)指标,并将结果与两个隐藏层多层感知器的结果进行比较。结果表明,真空相和水相中亲电性的第二个 CDP 定义以及新提出的 Log QP 描述符是预测 AA 诱变活性的最重要定义(分别为 ω+VacCDP2+、ω+AqCDP2+ 和 LogQP1+Vac)。结果表明,在构建 AA 诱变活性的预测模型时,应考虑代谢活化、水性溶剂性质以及 CDP 亲电方案和 Log QP。这项研究为 QSAR 研究提供了一种可复制的无代码方法,使更广泛的受众能够访问高级 ML 和 CDFT 应用程序。 未来的工作将把这些方法扩展到其他化合物家族,增强研究诱变活性和其他生物现象的预测能力。
更新日期:2024-11-11
down
wechat
bug