Large Language Models as Molecular Design Engines,Journal of Chemical Information and Modeling

当前位置： X-MOL 学术 › J. Chem. Inf. Model. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large Language Models as Molecular Design Engines
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2024-09-04 , DOI: 10.1021/acs.jcim.4c01396
Debjyoti Bhattacharya ₁ , Harrison J Cassady ₂ , Michael A Hickner ₂ , Wesley F Reinhart _{1,

3}

Affiliation

The design of small molecules is crucial for technological applications ranging from drug discovery to energy storage. Due to the vast design space available to modern synthetic chemistry, the community has increasingly sought to use data-driven and machine learning approaches to navigate this space. Although generative machine learning methods have recently shown potential for computational molecular design, their use is hindered by complex training procedures, and they often fail to generate valid and unique molecules. In this context, pretrained Large Language Models (LLMs) have emerged as potential tools for molecular design, as they appear to be capable of creating and modifying molecules based on simple instructions provided through natural language prompts. In this work, we show that the Claude 3 Opus LLM can read, write, and modify molecules according to prompts, with impressive 97% valid and unique molecules. By quantifying these modifications in a low-dimensional latent space, we systematically evaluate the model’s behavior under different prompting conditions. Notably, the model is able to perform guided molecular generation when asked to manipulate the electronic structure of molecules using simple, natural-language prompts. Our findings highlight the potential of LLMs as powerful and versatile molecular design engines.

中文翻译：

作为分子设计引擎的大型语言模型

小分子的设计对于从药物发现到能量存储等技术应用至关重要。由于现代合成化学拥有巨大的设计空间，社区越来越多地寻求使用数据驱动和机器学习方法来驾驭这一空间。尽管生成机器学习方法最近显示出计算分子设计的潜力，但它们的使用受到复杂的训练程序的阻碍，并且它们常常无法生成有效且独特的分子。在这种背景下，预训练的大型语言模型（ LLMs ）已成为分子设计的潜在工具，因为它们似乎能够根据自然语言提示提供的简单指令来创建和修改分子。在这项工作中，我们展示了 Claude 3 Opus LLM可以根据提示读取、写入和修改分子，其有效且独特的分子高达 97%。通过量化低维潜在空间中的这些修改，我们系统地评估了模型在不同提示条件下的行为。值得注意的是，当要求使用简单的自然语言提示操纵分子的电子结构时，该模型能够执行引导分子生成。我们的研究结果凸显了LLMs作为强大且多功能的分子设计引擎的潜力。

更新日期：2024-09-04

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南