当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.
Nucleic Acids Research ( IF 16.6 ) Pub Date : 2022-07-05 , DOI: 10.1093/nar/gkac278
Vineet Thumuluri 1 , José Juan Almagro Armenteros 2, 3 , Alexander Rosenberg Johansen 3, 4 , Henrik Nielsen 5 , Ole Winther 6, 7, 8
Affiliation  

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

中文翻译:

DeepLo​​c 2.0:使用蛋白质语言模型进行多标签亚细胞定位预测。

蛋白质亚细胞定位的预测对蛋白质组学研究具有重要意义。在这里,我们建议对流行的工具 DeepLo​​c 进行更新,以进行多定位预测并改进性能和可解释性。为了进行训练和验证,我们整理了具有严格同源性分区的真核和人类多位置蛋白质数据集,并丰富了从文献中汇编的排序信号信息。我们通过使用预训练的蛋白质语言模型在 DeepLo​​c 2.0 中实现了最先进的性能。它的另一个优点是它使用序列输入而不是依赖较慢的蛋白质图谱。我们提供了两种更好的可解释性方法:沿着序列的注意力输出和对九种不同类型的蛋白质分选信号的高度准确预测。我们发现注意力输出与排序信号的位置密切相关。Web 服务器位于 services.healthtech.dtu.dk/service.php?DeepLo​​c-2.0。
更新日期:2022-04-30
down
wechat
bug