当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identifying uncertainty in physical–chemical property estimation with IFSQSAR
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-05-30 , DOI: 10.1186/s13321-024-00853-w
Trevor N Brown 1 , Alessandro Sangion 1 , Jon A Arnot 1, 2, 3
Affiliation  

This study describes the development and evaluation of six new models for predicting physical–chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol–water (KOW), octanol–air (KOA), and air–water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure–Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for “novel chemicals” in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7–1.8 for log VP and log SW. Scientific contribution New partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.

中文翻译:


使用 IFSQSAR 识别物理化学性质估计中的不确定性



本研究描述了六种新模型的开发和评估,用于预测与化学危害、暴露和风险评估高度相关的物理化学 (PC) 特性:溶解度(在水 SW 和辛醇 SO 中)、蒸气压 (VP)、以及辛醇-水(KOW)、辛醇-空气(KOA)和空气-水(KAW)分配比。这些模型在迭代片段选择定量结构-活性关系 (IFSQSAR) python 包版本 1.1.0 中实现。这些模型以多参数线性自由能关系 (PPLFER) 方程的形式实现,该方程结合了实验校准的系统参数和通过 QSPR 预测的溶质描述符。另外两个辅助模型已经开发和实施,即摩尔体积 (MV) 的 QSPR 和室温下化学品物理状态的分类器。描述了 IFSQSAR 方法,用于表征适用性域 (AD) 并计算表示为预测属性的 95% 预测区间 (PI) 的不确定性估计,并在 9,000 个测量的分配比和 4,000 个 VP 和 SW 值上进行测试。测量的数据位于 IFSQSAR 训练和验证数据集之外,用于以公正的方式评估“新型化学品”模型的预测能力。根据分区比验证数据集计算出的 95% PI 区间需要缩放 1.25 倍才能捕获 95% 的外部数据。 VP 和 SW 的预测更加不确定,主要是由于在室温下区分它们的物理状态(即液体或固体)存在挑战。数据贫乏的新型化学品的 log KOW、log KAW 和 log KOA 模型的预测精度估计在 0.7 到 1 的范围内。4 预测均方根误差 (RMSEP),log VP 和 log SW 的 RMSEP 范围为 1.7–1.8。科学贡献 新的划分模型集成了经验 PPLFER 方程和 QSAR,允许实验数据和模型预测的无缝集成。这项工作测试了模型对不在模型训练或外部验证数据集中的新型化学品的真实预测能力。
更新日期:2024-05-30
down
wechat
bug