当前位置: X-MOL 学术Nat. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Protein language models trained on multiple sequence alignments learn phylogenetic relationships
Nature Communications ( IF 14.7 ) Pub Date : 2022-10-22 , DOI: 10.1038/s41467-022-34032-y
Umberto Lupo 1, 2 , Damiano Sgarbossa 1, 2 , Anne-Florence Bitbol 1, 2
Affiliation  

Self-supervised neural language models with attention have recently been applied to biological sequence data, advancing structure, function and mutational effect prediction. Some protein language models, including MSA Transformer and AlphaFold’s EvoFormer, take multiple sequence alignments (MSAs) of evolutionarily related proteins as inputs. Simple combinations of MSA Transformer’s row attentions have led to state-of-the-art unsupervised structural contact prediction. We demonstrate that similarly simple, and universal, combinations of MSA Transformer’s column attentions strongly correlate with Hamming distances between sequences in MSAs. Therefore, MSA-based language models encode detailed phylogenetic relationships. We further show that these models can separate coevolutionary signals encoding functional and structural constraints from phylogenetic correlations reflecting historical contingency. To assess this, we generate synthetic MSAs, either without or with phylogeny, from Potts models trained on natural MSAs. We find that unsupervised contact prediction is substantially more resilient to phylogenetic noise when using MSA Transformer versus inferred Potts models.



中文翻译:


经过多序列比对训练的蛋白质语言模型可以学习系统发育关系



具有注意力的自监督神经语言模型最近已应用于生物序列数据,推进结构、功能和突变效应预测。一些蛋白质语言模型,包括 MSA Transformer 和 AlphaFold 的 EvoFormer,将进化相关蛋白质的多重序列比对 (MSA) 作为输入。 MSA Transformer 行注意力的简单组合带来了最先进的无监督结构接触预测。我们证明,同样简单且通用的 MSA Transformer 列注意力组合与 MSA 中序列之间的汉明距离密切相关。因此,基于 MSA 的语言模型编码了详细的系统发育关系。我们进一步表明,这些模型可以将编码功能和结构约束的共同进化信号与反映历史偶然性的系统发育相关性分开。为了评估这一点,我们从在自然 MSA 上训练的 Potts 模型中生成合成的 MSA,无论是否有系统发育。我们发现,与推断的 Potts 模型相比,使用 MSA Transformer 时,无监督接触预测对系统发育噪声的抵抗力要强得多。

更新日期:2022-10-22
down
wechat
bug