当前位置:
X-MOL 学术
›
J. Cheminfom.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-08-02 , DOI: 10.1186/s13321-024-00891-4 Run-Hsin Lin, Pinpin Lin, Chia-Chi Wang, Chun-Wei Tung
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-08-02 , DOI: 10.1186/s13321-024-00891-4 Run-Hsin Lin, Pinpin Lin, Chia-Chi Wang, Chun-Wei Tung
Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks. Scieific contribution A novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing.
中文翻译:
针对具有不同化学空间的任务的新型多任务学习算法:以斑马鱼毒性预测为例
数据稀缺是阻碍化学效应预测模型发展的最关键问题之一。利用相关任务知识的多任务学习算法显示出处理数据有限的任务的潜力。然而,当前的多任务方法主要集中于从任务标签可用于大多数训练样本的数据集进行学习。由于数据集是为不同目的而生成的,具有不同的化学空间,因此传统的多任务学习方法可能不适合。本研究提出了一种新颖的多任务学习方法 MTForestNet,它可以处理数据稀缺问题并从具有不同化学空间的任务中学习。 MTForestNet 由以渐进网络形式组织的随机森林分类器节点组成,其中每个节点代表从特定任务中学习到的随机森林模型。为了证明 MTForestNet 的有效性,收集了 48 个斑马鱼毒性数据集并用作示例。其中,有两个任务与其他任务有很大不同,只有1.3%的共同化学品与其他任务共享。在一项独立测试中,MTForestNet 的受试者工作特征曲线下面积 (AUC) 值为 0.911,与单任务和多任务方法相比,提供了卓越的性能。从开发的斑马鱼毒性模型得出的总体毒性与实验确定的总体毒性密切相关。此外,已开发的斑马鱼毒性模型的输出可用作增强发育毒性预测的特征。 开发的模型可有效预测斑马鱼毒性,并且所提出的 MTForestNet 预计可用于具有不同化学空间的任务,这些任务可应用于其他任务。科学贡献 提出了一种新颖的多任务学习算法 MTForestNet,以解决使用具有不同化学空间的数据集开发模型的挑战,这是化学信息学任务的常见问题。例如,斑马鱼毒性预测模型是使用所提出的 MTForestNet 开发的,该模型提供了优于传统单任务和多任务学习方法的性能。此外,开发的斑马鱼毒性预测模型可以减少动物试验。
更新日期:2024-08-02
中文翻译:
针对具有不同化学空间的任务的新型多任务学习算法:以斑马鱼毒性预测为例
数据稀缺是阻碍化学效应预测模型发展的最关键问题之一。利用相关任务知识的多任务学习算法显示出处理数据有限的任务的潜力。然而,当前的多任务方法主要集中于从任务标签可用于大多数训练样本的数据集进行学习。由于数据集是为不同目的而生成的,具有不同的化学空间,因此传统的多任务学习方法可能不适合。本研究提出了一种新颖的多任务学习方法 MTForestNet,它可以处理数据稀缺问题并从具有不同化学空间的任务中学习。 MTForestNet 由以渐进网络形式组织的随机森林分类器节点组成,其中每个节点代表从特定任务中学习到的随机森林模型。为了证明 MTForestNet 的有效性,收集了 48 个斑马鱼毒性数据集并用作示例。其中,有两个任务与其他任务有很大不同,只有1.3%的共同化学品与其他任务共享。在一项独立测试中,MTForestNet 的受试者工作特征曲线下面积 (AUC) 值为 0.911,与单任务和多任务方法相比,提供了卓越的性能。从开发的斑马鱼毒性模型得出的总体毒性与实验确定的总体毒性密切相关。此外,已开发的斑马鱼毒性模型的输出可用作增强发育毒性预测的特征。 开发的模型可有效预测斑马鱼毒性,并且所提出的 MTForestNet 预计可用于具有不同化学空间的任务,这些任务可应用于其他任务。科学贡献 提出了一种新颖的多任务学习算法 MTForestNet,以解决使用具有不同化学空间的数据集开发模型的挑战,这是化学信息学任务的常见问题。例如,斑马鱼毒性预测模型是使用所提出的 MTForestNet 开发的,该模型提供了优于传统单任务和多任务学习方法的性能。此外,开发的斑马鱼毒性预测模型可以减少动物试验。