Nature ( IF 50.5 ) Pub Date : 2024-09-25 , DOI: 10.1038/s41586-024-07966-0 Andre J. Faure, Aina Martí-Aranda, Cristina Hidalgo-Carcedo, Antoni Beltran, Jörn M. Schmiedel, Ben Lehner
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
中文翻译:
蛋白质稳定性的遗传结构
合成 100 个氨基酸 (aa) 的蛋白质 (20100) 的方法比宇宙中的原子还多。在如此广阔的序列空间中,只有极一小部分可以进行实验或计算研究。深度神经网络越来越多地用于导航高维序列空间1。但是,这些模型非常复杂。在这里,通过从大于 1010 的序列空间中进行实验采样,我们表明至少一些蛋白质的遗传结构非常简单,从而可以在具有完全可解释能量模型的高维序列空间中进行准确的遗传预测。这些模型捕获了自由能和表型之间的非线性关系,但其他方面由加性自由能变化组成,其中成对能量耦合的贡献很小。这些能量耦合是稀疏的,并且与结构接触和主干接近有关。我们的结果表明,蛋白质遗传学实际上既简单又易于理解。