当前位置: X-MOL 学术J. Comput. Chem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting the effects of mutations on protein solubility using graph convolution network and protein language model representation
Journal of Computational Chemistry ( IF 3.4 ) Pub Date : 2023-11-07 , DOI: 10.1002/jcc.27249
Jing Wang 1, 2 , Sheng Chen 2 , Qianmu Yuan 2 , Jianwen Chen 2 , Danping Li 3 , Lei Wang 4 , Yuedong Yang 2
Affiliation  

Solubility is one of the most important properties of protein. Protein solubility can be greatly changed by single amino acid mutations and the reduced protein solubility could lead to diseases. Since experimental methods to determine solubility are time-consuming and expensive, in-silico methods have been developed to predict the protein solubility changes caused by mutations mostly through protein evolution information. However, these methods are slow since it takes long time to obtain evolution information through multiple sequence alignment. In addition, these methods are of low performance because they do not fully utilize protein 3D structures due to a lack of experimental structures for most proteins. Here, we proposed a sequence-based method DeepMutSol to predict solubility change from residual mutations based on the Graph Convolutional Neural Network (GCN), where the protein graph was initiated according to predicted protein structure from Alphafold2, and the nodes (residues) were represented by protein language embeddings. To circumvent the small data of solubility changes, we further pretrained the model over absolute protein solubility. DeepMutSol was shown to outperform state-of-the-art methods in benchmark tests. In addition, we applied the method to clinically relevant genes from the ClinVar database and the predicted solubility changes were shown able to separate pathogenic mutations. All of the data sets and the source code are available at https://github.com/biomed-AI/DeepMutSol.

中文翻译:


使用图卷积网络和蛋白质语言模型表示预测突变对蛋白质溶解度的影响



溶解度是蛋白质最重要的特性之一。单个氨基酸突变可以极大地改变蛋白质的溶解度,蛋白质溶解度的降低可能导致疾病。由于确定溶解度的实验方法既耗时又昂贵,因此已经开发出计算机方法来预测突变引起的蛋白质溶解度变化,主要通过蛋白质进化信息。然而,这些方法速度很慢,因为通过多重序列比对获得进化信息需要很长时间。此外,这些方法的性能较低,因为大多数蛋白质缺乏实验结构,无法充分利用蛋白质 3D 结构。在这里,我们提出了一种基于序列的方法 DeepMutSol 来预测基于图卷积神经网络(GCN)的残留突变的溶解度变化,其中蛋白质图是根据 Alphafold2 预测的蛋白质结构启动的,并表示节点(残基)通过蛋白质语言嵌入。为了规避溶解度变化的小数据,我们进一步针对蛋白质绝对溶解度对模型进行了预训练。在基准测试中,DeepMutSol 的性能优于最先进的方法。此外,我们将该方法应用于 ClinVar 数据库中的临床相关基因,预测的溶解度变化表明能够分离致病突变。所有数据集和源代码均可在 https://github.com/biomed-AI/DeepMutSol 上获取。
更新日期:2023-11-09
down
wechat
bug