Nature Methods ( IF 36.1 ) Pub Date : 2019-07-15 , DOI: 10.1038/s41592-019-0496-6 Kevin K. Yang , Zachary Wu , Frances H. Arnold
Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence–function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.
中文翻译:
机器学习指导的蛋白质工程定向进化
通过机器学习指导的定向进化进行蛋白质工程,可以优化蛋白质功能。机器学习方法可以预测序列如何映射以数据驱动的方式发挥作用,而无需底层物理或生物学途径的详细模型。通过从表征的变体的特性中学习并使用该信息来选择可能表现出改进的特性的序列,此类方法可加速定向进化。在这里,我们介绍构建机器学习序列功能模型并使用这些模型指导工程所需的步骤,并在每个阶段提出建议。这篇综述涵盖了与在蛋白质工程中使用机器学习有关的基本概念,以及该工程范式的最新文献和应用。我们通过两个案例研究来说明这一过程。最后,我们期待机器学习的未来机会,以发现未知的蛋白质功能并揭示蛋白质序列与功能之间的关系。