Large language models design sequence-defined macromolecules via evolutionary optimization,npj Computational Materials

当前位置： X-MOL 学术 › npj Comput. Mater. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large language models design sequence-defined macromolecules via evolutionary optimization
npj Computational Materials ( IF 9.4 ) Pub Date : 2024-11-18 , DOI: 10.1038/s41524-024-01449-6
Wesley F. Reinhart, Antonia Statt

We demonstrate the ability of a large language model to perform evolutionary optimization for materials discovery. Anthropic’s Claude 3.5 model outperforms an active learning scheme with handcrafted surrogate models and an evolutionary algorithm in selecting monomer sequences to produce targeted morphologies in macromolecular self-assembly. Utilizing pre-trained language models can potentially reduce the need for hyperparameter tuning while offering new capabilities such as self-reflection. The model performs this task effectively with or without context about the task itself, but domain-specific context sometimes results in faster convergence to good solutions. Furthermore, when this context is withheld, the model infers an approximate notion of the task (e.g., calling it a protein folding problem). This work provides evidence of Claude 3.5’s ability to act as an evolutionary optimizer, a recently discovered emergent behavior of large language models, and demonstrates a practical use case in the study and design of soft materials.

中文翻译：

大型语言模型通过进化优化设计序列定义的大分子

我们展示了大型语言模型对材料发现进行进化优化的能力。Anthropic 的 Claude 3.5 模型在选择单体序列以在大分子自组装中产生目标形态方面优于具有手工代理模型和进化算法的主动学习方案。利用预先训练的语言模型可能会减少对超参数调整的需求，同时提供自我反射等新功能。无论是否具有有关任务本身的上下文，模型都可以有效地执行此任务，但特定于域的上下文有时会导致更快地收敛到良好的解决方案。此外，当保留此上下文时，模型会推断出任务的近似概念（例如，将其称为蛋白质折叠问题）。这项工作为 Claude 3.5 充当进化优化器的能力提供了证据，这是最近发现的大型语言模型的紧急行为，并展示了软材料研究和设计中的实际用例。

更新日期：2024-11-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南