Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction,Genome Research

当前位置： X-MOL 学术 › Genome Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction
Genome Research ( IF 6.2 ) Pub Date : 2024-07-26 , DOI: 10.1101/gr.279132.124
Shuai Zeng ₁ , Duolin Wang ₁ , Lei Jiang ₁ , Dong Xu ₂

Affiliation

Signal peptides (SP) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provides a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.

中文翻译：

对大型蛋白质语言模型进行参数有效的微调可改善信号肽预测

信号肽（SP）在细胞内蛋白质易位中发挥着至关重要的作用。大型蛋白质语言模型（PLM）和基于提示的学习的发展为 SP 预测提供了新的机会，特别是对于注释数据有限的类别。我们提出了一种用于 SP 预测的参数高效微调 (PEFT) 框架 PEFT-SP，以有效利用预训练的 PLM。我们将低秩适应 (LoRA) 集成到 ESM-2 模型中，以更好地利用 PLM 的蛋白质序列进化知识。实验表明，使用 LoRA 的 PEFT-SP 增强了最先进的结果，小训练样本的 SP 的最大马修斯相关系数 (MCC) 增益为 87.3%，总体 MCC 增益为 6.1%。此外，我们还在 ESM-2 中采用了另外两种 PEFT 方法，即提示调整和适配器调整来进行 SP 预测。更精细的实验表明，使用适配器调整的 PEFT-SP 还可以将小训练样本的 SP 的 MCC 增益提高高达 28.1%，总体 MCC 增益为 3.8%。 LoRA 在训练阶段比适配器需要更少的计算资源和内存，从而可以适应更大、更强大的蛋白质模型进行 SP 预测。

更新日期：2024-07-26

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南