当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction
Genome Research ( IF 6.2 ) Pub Date : 2024-07-26 , DOI: 10.1101/gr.279132.124
Shuai Zeng 1 , Duolin Wang 1 , Lei Jiang 1 , Dong Xu 2
Affiliation  

Signal peptides (SP) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provides a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.

中文翻译:


对大型蛋白质语言模型进行参数有效的微调可改善信号肽预测



信号肽(SP)在细胞内蛋白质易位中发挥着至关重要的作用。大型蛋白质语言模型(PLM)和基于提示的学习的发展为 SP 预测提供了新的机会,特别是对于注释数据有限的类别。我们提出了一种用于 SP 预测的参数高效微调 (PEFT) 框架 PEFT-SP,以有效利用预训练的 PLM。我们将低秩适应 (LoRA) 集成到 ESM-2 模型中,以更好地利用 PLM 的蛋白质序列进化知识。实验表明,使用 LoRA 的 PEFT-SP 增强了最先进的结果,小训练样本的 SP 的最大马修斯相关系数 (MCC) 增益为 87.3%,总体 MCC 增益为 6.1%。此外,我们还在 ESM-2 中采用了另外两种 PEFT 方法,即提示调整和适配器调整来进行 SP 预测。更精细的实验表明,使用适配器调整的 PEFT-SP 还可以将小训练样本的 SP 的 MCC 增益提高高达 28.1%,总体 MCC 增益为 3.8%。 LoRA 在训练阶段比适配器需要更少的计算资源和内存,从而可以适应更大、更强大的蛋白质模型进行 SP 预测。
更新日期:2024-07-26
down
wechat
bug