当前位置: X-MOL 学术Structure › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DR-BERT: A protein language model to annotate disordered regions
Structure ( IF 5.7 ) Pub Date : 2024-05-02 , DOI: 10.1016/j.str.2024.04.010
Ananthan Nambiar , John Malcolm Forsyth , Simon Liu , Sergei Maslov

Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT’s ability to use contextual information.



中文翻译:

DR-BERT:用于注释无序区域的蛋白质语言模型

尽管缺乏刚性结构,蛋白质中的本质无序区域(IDR)在细胞功能中发挥着重要作用,包括介导蛋白质-蛋白质相互作用。因此,以高精度计算注释 IDR 非常重要。在这项研究中,我们使用 Transformers 的双向编码器表示(DR-BERT)(一种紧凑的蛋白质语言模型)来进行无序区域预测。与大多数流行的工具不同,DR-BERT 在未注释的蛋白质上进行了预训练,并经过训练来预测 IDR,而不依赖于明确的进化或生物物理数据。尽管如此,DR-BERT 在蛋白质内在障碍 (CAID) 评估数据集的批判性评估上表现出了比现有方法的显着改进,并且在 CAID 2 数据集中的四个测试用例中的两个上优于竞争对手,同时保持了在其他测试用例中的竞争力。这种性能归功于预训练期间学到的信息以及 DR-BERT 使用上下文信息的能力。

更新日期:2024-05-02
down
wechat
bug