Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-10-21 , DOI: 10.1038/s42256-024-00916-5 Zhenxing Wu, Odin Zhang, Xiaorui Wang, Li Fu, Huifeng Zhao, Jike Wang, Hongyan Du, Dejun Jiang, Yafeng Deng, Dongsheng Cao, Chang-Yu Hsieh, Tingjun Hou
Optimizing a candidate molecule’s physiochemical and functional properties has been a critical task in drug and material design. Although the non-trivial task of balancing multiple (potentially conflicting) optimization objectives is considered ideal for artificial intelligence, several technical challenges such as the scarcity of multiproperty-labelled training data have hindered the development of a satisfactory AI solution for a long time. Prompt-MolOpt is a tool for molecular optimization; it makes use of prompt-based embeddings, as used in large language models, to improve the transformer’s ability to optimize molecules for specific property adjustments. Notably, Prompt-MolOpt excels in working with limited multiproperty data (even under the zero-shot setting) by effectively generalizing causal relationships learned from single-property datasets. In comparative evaluations against established models such as JTNN, hierG2G and Modof, Prompt-MolOpt achieves over a 15% relative improvement in multiproperty optimization success rates compared with the leading Modof model. Furthermore, a variant of Prompt-MolOpt, named Prompt-MolOptP, can preserve the pharmacophores or any user-specified fragments under the structural transformation, further broadening its application scope. By constructing tailored optimization datasets, with the protocol introduced in this work, Prompt-MolOpt steers molecular optimization towards domain-relevant chemical spaces, enhancing the quality of the optimized molecules. Real-world tests, such as those involving blood–brain barrier permeability optimization, underscore its practical relevance. Prompt-MolOpt offers a versatile approach for multiproperty and multi-site molecular optimizations, suggesting its potential utility in chemistry research and drug and material discovery.
中文翻译:
利用语言模型通过快速工程进行高级多属性分子优化
优化候选分子的理化和功能特性一直是药物和材料设计中的一项关键任务。尽管平衡多个(可能冲突的)优化目标这一重要任务被认为是人工智能的理想选择,但长期以来,多项技术挑战(例如多属性标记训练数据的稀缺)阻碍了令人满意的 AI 解决方案的开发。Prompt-MolOpt 是一种分子优化工具;它利用大型语言模型中使用的基于提示的嵌入来提高 transformer 优化分子以进行特定属性调整的能力。值得注意的是,Prompt-MolOpt 通过有效地概括从单属性数据集中学到的因果关系,擅长处理有限的多属性数据(即使在零镜头设置下)。在与JTNN、hierG2G和Modof等已建立模型的比较评估中,与领先的Modof模型相比,Prompt-MolOpt的多属性优化成功率相对提高了15%以上。此外,Prompt-MolOpt 的变体名为 Prompt-MolOptP,可以在结构转化下保留药效团或任何用户指定的片段,进一步拓宽了其应用范围。通过构建定制的优化数据集,使用这项工作中介绍的协议,Prompt-MolOpt 将分子优化引导到域相关的化学空间,从而提高优化分子的质量。真实世界的测试,例如涉及血脑屏障通透性的测试,强调了其实际相关性。 Prompt-MolOpt 为多属性和多位点分子优化提供了一种通用方法,表明其在化学研究以及药物和材料发现中的潜在用途。