当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-06-07 , DOI: 10.1186/s13321-024-00862-9
Yufang Zhang 1, 2, 3 , Jiayi Li 4 , Shenggeng Lin 4 , Jianwei Zhao 4 , Yi Xiong 4, 5 , Dong-Qing Wei 2, 3, 4
Affiliation  

Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope of candidates for traditional labor-intensive, time-consuming and expensive experimental techniques. In this study, we proposed an end-to-end approach termed SPVec-SGCN-CPI, which utilized simplified graph convolutional network (SGCN) model with low-dimensional and continuous features generated from our previously developed model SPVec and graph topology information to predict compound-protein interactions. The SGCN technique, dividing the local neighborhood aggregation and nonlinearity layer-wise propagation steps, effectively aggregates K-order neighbor information while avoiding neighbor explosion and expediting training. The performance of the SPVec-SGCN-CPI method was assessed across three datasets and compared against four machine learning- and deep learning-based methods, as well as six state-of-the-art methods. Experimental results revealed that SPVec-SGCN-CPI outperformed all these competing methods, particularly excelling in unbalanced data scenarios. By propagating node features and topological information to the feature space, SPVec-SGCN-CPI effectively incorporates interactions between compounds and proteins, enabling the fusion of heterogeneity. Furthermore, our method scored all unlabeled data in ChEMBL, confirming the top five ranked compound-protein interactions through molecular docking and existing evidence. These findings suggest that our model can reliably uncover compound-protein interactions within unlabeled compound-protein pairs, carrying substantial implications for drug re-profiling and discovery. In summary, SPVec-SGCN demonstrates its efficacy in accurately predicting compound-protein interactions, showcasing potential to enhance target identification and streamline drug discovery processes. Scientific contributions The methodology presented in this work not only enables the comparatively accurate prediction of compound-protein interactions but also, for the first time, take sample imbalance which is very common in real world and computation efficiency into consideration simultaneously, accelerating the target identification and drug discovery process.

中文翻译:


基于简化同质图卷积网络和预训练语言模型的预测化合物-蛋白质相互作用的端到端方法



化合物和蛋白质之间相互作用的鉴定对于各种应用至关重要,包括药物发现、靶标识别、网络药理学和蛋白质功能阐明。基于深度神经网络的方法在有效识别具有高通量能力的化合物-蛋白质相互作用方面变得越来越流行,缩小了传统劳动密集型、耗时且昂贵的实验技术的候选范围。在本研究中,我们提出了一种称为 SPVec-SGCN-CPI 的端到端方法,该方法利用简化图卷积网络(SGCN)模型,该模型具有从我们之前开发的模型 SPVec 和图拓扑信息生成的低维和连续特征来进行预测化合物-蛋白质相互作用。 SGCN技术将局部邻域聚合和非线性逐层传播步骤分开,有效聚合K阶邻域信息,同时避免邻域爆炸并加快训练速度。 SPVec-SGCN-CPI 方法的性能在三个数据集上进行了评估,并与四种基于机器学习和深度学习的方法以及六种最先进的方法进行了比较。实验结果表明,SPVec-SGCN-CPI 优于所有这些竞争方法,特别是在不平衡数据场景中表现出色。通过将节点特征和拓扑信息传播到特征空间,SPVec-SGCN-CPI有效地融合了化合物和蛋白质之间的相互作用,从而实现了异质性的融合。此外,我们的方法对 ChEMBL 中的所有未标记数据进行评分,通过分子对接和现有证据确认排名前五的化合物-蛋白质相互作用。 这些发现表明,我们的模型可以可靠地揭示未标记的化合物-蛋白质对中的化合物-蛋白质相互作用,这对药物重新分析和发现具有重大意义。总之,SPVec-SGCN 展示了其在准确预测化合物-蛋白质相互作用方面的功效,展示了增强靶点识别和简化药物发现流程的潜力。科学贡献 这项工作提出的方法不仅能够相对准确地预测化合物-蛋白质相互作用,而且首次将现实世界中常见的样本不平衡和计算效率同时考虑在内,加速了目标识别和预测。药物发现过程。
更新日期:2024-06-08
down
wechat
bug