Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals,Environmental Science & Technology

当前位置： X-MOL 学术 › Environ. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Integrated Transfer Learning and Multitask Learning Strategies to Construct Graph Neural Network Models for Predicting Bioaccumulation Parameters of Chemicals
Environmental Science & Technology ( IF 10.8 ) Pub Date : 2024-07-25 , DOI: 10.1021/acs.est.4c02421
Zijun Xiao ₁ , Minghua Zhu _{1,

2} , Jingwen Chen ₁ , Zecang You ₁

Affiliation

Accurate prediction of parameters related to the environmental exposure of chemicals is crucial for the sound management of chemicals. However, the lack of large data sets for training models may result in poor prediction accuracy and robustness. Herein, integrated transfer learning (TL) and multitask learning (MTL) was proposed for constructing a graph neural network (GNN) model (abbreviated as TL-MTL-GNN model) using n-octanol/water partition coefficients as a source domain. The TL-MTL-GNN model was trained to predict three bioaccumulation parameters based on enlarged data sets that cover 2496 compounds with at least one bioaccumulation parameter. Results show that the TL-MTL-GNN model outperformed single-task GNN models with and without the TL, as well as conventional machine learning models trained with molecular descriptors or fingerprints. Applicability domains were characterized by a state-of-the-art structure–activity landscape-based (abbreviated as AD_SAL) methodology. The TL-MTL-GNN model coupled with the optimal AD_SAL was employed to predict bioaccumulation parameters for around 60,000 chemicals, with more than 13,000 compounds identified as bioaccumulative chemicals. The high predictive accuracy and robustness of the TL-MTL-GNN model demonstrate the feasibility of integrating the TL and MTL strategy in modeling small-sized data sets. The strategy holds significant potential for addressing small data challenges in modeling environmental chemicals.

中文翻译：

集成迁移学习和多任务学习策略构建图神经网络模型来预测化学品的生物累积参数

准确预测与化学品环境暴露相关的参数对于化学品的健全管理至关重要。然而，缺乏训练模型的大数据集可能会导致预测准确性和鲁棒性较差。在此，提出了集成迁移学习（TL）和多任务学习（MTL）来构建使用正辛醇/水分配系数作为源域的图神经网络（GNN）模型（简称TL-MTL-GNN模型）。 TL-MTL-GNN 模型经过训练，可根据扩大的数据集预测三个生物蓄积参数，该数据集涵盖具有至少一个生物蓄积参数的 2496 种化合物。结果表明，TL-MTL-GNN 模型的性能优于带有和不带有 TL 的单任务 GNN 模型，以及使用分子描述符或指纹训练的传统机器学习模型。适用性领域以最先进的基于结构-活动景观（缩写为 AD _SAL ）的方法为特征。 TL-MTL-GNN 模型与最佳 AD _SAL相结合，用于预测约 60,000 种化学品的生物累积参数，其中超过 13,000 种化合物被确定为生物累积性化学品。 TL-MTL-GNN模型的高预测精度和鲁棒性证明了整合TL和MTL策略在小规模数据集建模中的可行性。该策略在解决环境化学品建模中的小数据挑战方面具有巨大潜力。

更新日期：2024-07-25

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南