Chemical Engineering Journal ( IF 13.3 ) Pub Date : 2023-08-28 , DOI: 10.1016/j.cej.2023.145725 Rui Liu , Yuechuan Tang , Jie Tian , Jing Huang , Chaoyang Zhang , Linyuan Wang , Jian Liu
The sublimation enthalpy of energetic compounds is often predicted using quantum chemistry (QC) based quantitative structure–property relationship (QC-QSPR), which is accurate but requires high CPU cost. A feasible alternative is machine learning (ML), but it lacks applicability for energetic molecules, due to the limited experimental data thereof. A new data set for sublimation enthalpy is established, by extending a commonly used one with that of energetic organic compounds collected from literatures. Four topological descriptors are proposed to construct QSPRs, which exhibit higher accuracy than the QC-based ones, and are used to build ML based QSPRs with the four algorithms individually. The Extreme Gradient Boosting (XGBoost) model exhibits the highest accuracy, with the mean absolute error of 2.7 kcal/mol, followed by the Particle Swarm Optimization (PSO) one. Still, the PSO model is more portable and recommendable, because it is fully interpretable. The PSO model can accurately predict sublimation enthalpy with negligible CPU time cost, and is expected to be used to find novel energetic molecules by further predicting detonation properties.
中文翻译:
高能化合物升华焓的 QSPR 模型
通常使用基于量子化学 (QC) 的定量结构-性质关系 (QC-QSPR) 来预测含能化合物的升华焓,该方法准确但需要较高的 CPU 成本。一种可行的替代方案是机器学习(ML),但由于其实验数据有限,它缺乏对高能分子的适用性。通过将常用的数据集与从文献中收集的高能有机化合物的数据集相扩展,建立了一个新的升华焓数据集。提出了四种拓扑描述符来构建 QSPR,其比基于 QC 的 QSPR 表现出更高的精度,并用于分别使用四种算法构建基于 ML 的 QSPR。Extreme Gradient Boosting (XGBoost) 模型表现出最高的精度,平均绝对误差为 2.7 kcal/mol,其次是粒子群优化(PSO)。尽管如此,PSO 模型更易于移植和推荐,因为它是完全可解释的。PSO模型可以准确预测升华焓,且CPU时间成本可以忽略不计,有望通过进一步预测爆轰特性来寻找新型含能分子。