Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-07-30 , DOI: 10.1038/s42256-024-00875-x Yu Zong , Yuxin Wang , Xipeng Qiu , Xuanjing Huang , Liang Qiao
Protein glycosylation, a post-translational modification of proteins by glycans, plays an important role in numerous physiological and pathological cellular functions. Glycoproteomics, the study of protein glycosylation on a proteome-wide scale, utilizes liquid chromatography coupled with tandem mass spectrometry (MS/MS) to get combinational information on glycosylation site, glycosylation level and glycan structure. However, current database searching methods for glycoproteomics often struggle with glycan structure determination due to the limited occurrence of structure-determining ions. Although spectral searching methods can leverage fragment intensity to facilitate the structure identification of glycopeptides, their application is hindered by difficulties in spectral library construction. In this work, we present DeepGP, a hybrid deep learning framework based on transformer and graph neural networks, for the prediction of MS/MS spectra and retention time of glycopeptides. Two graph neural network modules are employed to capture the branched glycan structure and predict glycan ion intensity, respectively. Additionally, a pretraining strategy is implemented to alleviate the insufficiency of glycoproteomics data. Testing on multiple biological datasets, DeepGP accurately predicts MS/MS spectra and retention time of glycopeptides, closely aligning with the experimental results. Comprehensive benchmarking of DeepGP on synthetic and biological datasets validates its effectiveness in distinguishing similar glycans. Based on various decoy methods, DeepGP in combination with database searching can increase glycopeptide detection sensitivity. We anticipate that DeepGP can inspire extensive future work in glycoproteomics.
中文翻译:
糖肽串联质谱的深度学习预测为糖蛋白组学提供动力
蛋白质糖基化是聚糖对蛋白质的翻译后修饰,在许多生理和病理细胞功能中发挥着重要作用。糖蛋白质组学是在蛋白质组范围内研究蛋白质糖基化,利用液相色谱与串联质谱 (MS/MS) 相结合来获取糖基化位点、糖基化水平和聚糖结构的组合信息。然而,由于结构决定离子的出现有限,当前的糖蛋白质组学数据库搜索方法常常难以确定聚糖结构。尽管光谱搜索方法可以利用片段强度来促进糖肽的结构鉴定,但其应用因光谱库构建的困难而受到阻碍。在这项工作中,我们提出了 DeepGP,一种基于变压器和图神经网络的混合深度学习框架,用于预测糖肽的 MS/MS 谱和保留时间。采用两个图神经网络模块分别捕获支链聚糖结构并预测聚糖离子强度。此外,还实施了预训练策略来缓解糖蛋白质组学数据的不足。通过对多个生物数据集进行测试,DeepGP 准确预测了糖肽的 MS/MS 谱图和保留时间,与实验结果紧密一致。 DeepGP 在合成和生物数据集上的综合基准测试验证了其区分相似聚糖的有效性。基于各种诱饵方法,DeepGP结合数据库搜索可以提高糖肽检测的灵敏度。我们预计 DeepGP 可以激发糖蛋白组学领域未来的广泛工作。