Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-11-15 , DOI: 10.1038/s42256-024-00920-9 Zaixi Zhang, Wan Xiang Shen, Qi Liu, Marinka Zitnik
Designing protein-binding proteins is critical for drug discovery. However, artificial-intelligence-based design of such proteins is challenging due to the complexity of protein–ligand interactions, the flexibility of ligand molecules and amino acid side chains, and sequence–structure dependencies. We introduce PocketGen, a deep generative model that produces residue sequence and atomic structure of the protein regions in which ligand interactions occur. PocketGen promotes consistency between protein sequence and structure by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The graph transformer captures interactions at multiple scales, including atom, residue and ligand levels. For sequence refinement, PocketGen integrates a structural adapter into the protein language model, ensuring that structure-based predictions align with sequence-based predictions. PocketGen can generate high-fidelity protein pockets with enhanced binding affinity and structural validity. It operates ten times faster than physics-based methods and achieves a 97% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets. Additionally, it attains an amino acid recovery rate exceeding 63%.
中文翻译:
使用 PocketGen 高效生成蛋白质袋
设计蛋白质结合蛋白对于药物发现至关重要。然而,由于蛋白质-配体相互作用的复杂性、配体分子和氨基酸侧链的灵活性以及序列-结构依赖性,此类蛋白质的基于人工智能的设计具有挑战性。我们介绍了 PocketGen,这是一种深度生成模型,可生成发生配体相互作用的蛋白质区域的残基序列和原子结构。PocketGen 通过使用图形转换器进行结构编码和基于蛋白质语言模型的序列细化模块来促进蛋白质序列和结构之间的一致性。Graph Transformer 捕获多个尺度的相互作用,包括原子、残基和配体水平。为了进行序列细化,PocketGen 将结构适配器集成到蛋白质语言模型中,确保基于结构的预测与基于序列的预测保持一致。PocketGen 可以生成具有增强结合亲和力和结构效度的高保真蛋白口袋。它的运行速度比基于物理的方法快 10 倍,成功率达到 97%,成功率定义为生成的口袋比参考口袋具有更高结合亲和力的百分比。此外,它的氨基酸回收率超过 63%。