当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-10-28 , DOI: 10.1186/s13321-024-00913-1
Sina Abdollahi, Darius P. Schaub, Madalena Barroso, Nora C. Laubach, Wiebke Hutwelker, Ulf Panzer, S.øren W. Gersting, Stefan Bonn

The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here, we perform an in-depth comparison of 12 state-of-the-art deep learning architectures that use different protein and compound representations. The models were selected for their reported performance and architectures. To reliably compare model performance, we curated over 300 thousand binding and non-binding CTIs and established several gold-standard datasets of varying size and information. Based on our findings, DeepConv-DTI consistently outperforms other models in CTI prediction performance across the majority of datasets. It achieves an MCC of 0.6 or higher for most of the datasets and is one of the fastest models in training and inference. These results indicate that utilizing convolutional-based windows as in DeepConv-DTI to traverse trainable embeddings is a highly effective approach for capturing informative protein features. We also observed that physicochemical embeddings of targets increased model performance. We therefore modified DeepConv-DTI to include normalized physicochemical properties, which resulted in the overall best performing model Phys-DeepConv-DTI. This work highlights how the systematic evaluation of input features of compounds and targets, as well as their corresponding neural network architectures, can serve as a roadmap for the future development of improved CTI models. Scientific contribution This work features comprehensive CTI datasets to allow for the objective comparison and benchmarking of CTI prediction algorithms. Based on this dataset, we gained insights into which embeddings of compounds and targets and which deep learning-based algorithms perform best, providing a blueprint for the future development of CTI algorithms. Using the insights gained from this screen, we provide a novel CTI algorithm with state-of-the-art performance.

中文翻译:


基于深度学习的化合物-靶标相互作用预测模型的全面比较,揭示指导设计原则



化合物-靶标相互作用 (CTI) 的评估是药物发现工作的核心。鉴于经典实验筛选的大量时间和金钱成本,人们投入了大量精力来开发能够准确预测 CTI 的基于深度学习的模型。然而,仍然缺乏在大型精选 CTI 数据集上对这些模型进行全面比较。在这里,我们对 12 种使用不同蛋白质和化合物表示的最先进的深度学习架构进行了深入比较。这些模型是根据其报告的性能和架构来选择的。为了可靠地比较模型性能,我们策划了超过 30 万个结合和非结合 CTI,并建立了几个不同大小和信息的黄金标准数据集。根据我们的研究结果,DeepConv-DTI 在大多数数据集的 CTI 预测性能方面始终优于其他模型。对于大多数数据集,它的 MCC 为 0.6 或更高,是训练和推理中最快的模型之一。这些结果表明,利用 DeepConv-DTI 中基于卷积的窗口来遍历可训练的嵌入是捕获信息性蛋白质特征的高效方法。我们还观察到,目标的物理化学嵌入提高了模型性能。因此,我们修改了 DeepConv-DTI 以包含归一化的物理化学性质,从而产生了整体性能最佳的模型 Phys-DeepConv-DTI。这项工作强调了对化合物和靶标的输入特征及其相应的神经网络架构的系统评估如何作为改进 CTI 模型未来开发的路线图。 科学贡献 这项工作以全面的 CTI 数据集为特色,允许对 CTI 预测算法进行客观比较和基准测试。基于此数据集,我们深入了解了化合物和靶标的嵌入以及哪些基于深度学习的算法表现最佳,为 CTI 算法的未来发展提供了蓝图。利用从此筛选中获得的见解,我们提供了一种具有最先进性能的新型 CTI 算法。
更新日期:2024-10-29
down
wechat
bug