当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-11-04 , DOI: 10.1186/s13321-024-00912-2
Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee

We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein–ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein–ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model’s efficiency and generalizability. The model’s efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery. Scientific contribution Our work introduces a novel training strategy for a protein–ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.

中文翻译:


通过结合物理能量函数和图神经网络来准确预测蛋白质-配体相互作用



我们引入了一种用于预测蛋白质-配体相互作用的高级模型。我们的方法将图神经网络的优势与基于物理的评分方法相结合。现有的基于结构的蛋白质-配体结合预测机器学习模型在实际的虚拟筛选场景中往往无法满足要求,这受到结合姿势的复杂性、药物样分子的化学多样性以及蛋白质-配体复合物晶体学数据的稀缺性的影响。为了克服现有基于机器学习的预测模型的局限性,我们提出了一种融合了三个独立神经网络模型的新方法。一种分类模型旨在对给定的蛋白质-配体复杂姿势进行二进制预测。其他两个回归模型经过训练,可以预测配体构象与输入复合物结构的结合亲和力和均方根偏差。我们训练了模型,以解释实验和预测结合亲和力以及姿势预测不确定性的偏差。通过将三元组神经网络的输出与基于物理的评分函数有效地集成,我们的模型在命中识别方面显示出显着提高的性能。具有三个独立诱饵集的基准结果表明,我们的模型在前向筛选方面优于现有模型。我们的模型在 CASF2016 和 DUD-E 基准集中分别实现了 32.7 和 23.1 的前 1% 富集因子。使用 LIT-PCBA 集的基准结果进一步证实了其较高的平均富集因子,强调了模型的效率和泛化性。 通过在 autotaxin 抑制剂的实验筛选中从 63 种候选化合物中鉴定出 23 种活性化合物,进一步验证了该模型的效率,证明了其在苗头化合物发现中的实际适用性。科学贡献我们的工作通过整合三个独立子模型的输出并利用专业制作的诱饵集,为蛋白质-配体结合亲和力预测模型引入了一种新的训练策略。该模型在多个基准测试中展示了卓越的性能。LIT-PCBA 基准测试中的高富集因子证明了它加速苗头化合物发现的潜力。
更新日期:2024-11-05
down
wechat
bug