当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Pfam protein families database: embracing AI/ML
Nucleic Acids Research ( IF 16.6 ) Pub Date : 2024-11-14 , DOI: 10.1093/nar/gkae997
Typhaine Paysan-Lafosse, Antonina Andreeva, Matthias Blum, Sara Rocio Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A Salazar, Maxwell L Bileschi, Felipe Llinares-López, Laetitia Meng-Papaxanthos, Lucy J Colwell, Nick V Grishin, R Dustin Schaeffer, Damiano Clementel, Silvio C E Tosatto, Erik Sonhammer, Valerie Wood, Alex Bateman

The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.

中文翻译:


Pfam 蛋白家族数据库:拥抱 AI/ML



Pfam 蛋白家族数据库是用于基因组注释和蛋白质结构和功能分析 (https://www.ebi.ac.uk/interpro/) 的蛋白质结构域和家族的综合集合。本更新描述了自 2020 年以来 Pfam 的主要进展,包括停用 Pfam 网站并与 InterPro 集成,与 ECOD 结构分类协调,以及扩大宏基因组、微蛋白和包含重复序列的家族的管理。我们重点介绍了如何利用 AlphaFold 结构预测来优化域边界和识别新域。描述了通过 AlphaFold 模型的大规模序列相似性分析发现的新家族。我们还详细介绍了 Pfam-N 的开发,它使用深度学习来扩大家庭覆盖率,与标准 Pfam 相比,UniProtKB 覆盖率提高了 8.8%。我们讨论了与 InterPro 集成的更频繁地发布 Pfam 的计划,以及人工智能进一步协助策展的潜力。尽管最近取得了进展,但许多蛋白质家族仍有待分类,Pfam 继续努力全面覆盖蛋白质领域。
更新日期:2024-11-14
down
wechat
bug