Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-11-19 , DOI: 10.1038/s42256-024-00928-1 Marko Njirjak, Lucija Žužić, Marko Babić, Patrizia Janković, Erik Otović, Daniela Kalafatovic, Goran Mauša
Supramolecular peptide-based materials have great potential for revolutionizing fields like nanotechnology and medicine. However, deciphering the intricate sequence-to-assembly pathway, essential for their real-life applications, remains a challenging endeavour. Their discovery relies primarily on empirical approaches that require substantial financial resources, impeding their disruptive potential. Consequently, despite the multitude of characterized self-assembling peptides and their demonstrated advantages, only a few peptide materials have found their way to the market. Machine learning trained on experimentally verified data presents a promising tool for quickly identifying sequences with a high propensity to self-assemble, thereby focusing resource expenditures on the most promising candidates. Here we introduce a framework that implements an accurate classifier in a metaheuristic-based generative model to navigate the search through the peptide sequence space of challenging size. For this purpose, we trained five recurrent neural networks among which the hybrid model that uses sequential information on aggregation propensity and specific physicochemical properties achieved a superior performance with 81.9% accuracy and 0.865 F1 score. Molecular dynamics simulations and experimental validation have confirmed the generative model to be 80–95% accurate in the discovery of self-assembling peptides, outperforming the current state-of-the-art models. The proposed modular framework efficiently complements human intuition in the exploration of self-assembling peptides and presents an important step in the development of intelligent laboratories for accelerated material discovery.
中文翻译:
用混合深度学习指导的生成式 AI 重塑自组装肽的发现
基于超分子肽的材料在彻底改变纳米技术和医学等领域具有巨大潜力。然而,破译对其实际应用至关重要的错综复杂的序列到组装途径仍然是一项具有挑战性的工作。他们的发现主要依赖于需要大量财政资源的实证方法,阻碍了它们的颠覆性潜力。因此,尽管有许多表征的自组装肽及其优势,但只有少数肽材料进入了市场。在实验验证数据上训练的机器学习提供了一种很有前途的工具,可以快速识别具有高度自组装倾向的序列,从而将资源支出集中在最有前途的候选序列上。在这里,我们介绍了一个框架,该框架在基于元启发式的生成模型中实现了一个准确的分类器,以在具有挑战性的肽序列空间中导航搜索。为此,我们训练了五个递归神经网络,其中使用聚集倾向和特定物理化学性质的顺序信息的混合模型取得了卓越的性能,准确率为 81.9%,F1 评分为 0.865。分子动力学模拟和实验验证已证实,生成模型在发现自组装肽方面的准确率为 80-95%,优于当前最先进的模型。所提出的模块化框架有效地补充了人类在探索自组装肽方面的直觉,并为加速材料发现的智能实验室的发展迈出了重要一步。