当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Molecular set representation learning
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-07-05 , DOI: 10.1038/s42256-024-00856-0
Maria Boulougouri , Pierre Vandergheynst , Daniel Probst

Computational representation of molecules can take many forms, including graphs, string encodings of graphs, binary vectors or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classification and regression tasks using a wide range of machine learning models. However, existing models come with limitations, such as the requirement for clearly defined chemical bonds, which often do not represent the true underlying nature of a molecule. Here we propose a framework for molecular machine learning tasks based on set representation learning. We show that learning on sets of atom invariants alone reaches the performance of state-of-the-art graph-based models on the most-used chemical benchmark datasets and that introducing a set representation layer into graph neural networks can surpass the performance of established methods in the domains of chemistry, biology and material science. We introduce specialized set representation-based neural network architectures for reaction-yield and protein–ligand binding-affinity prediction. Overall, we show that the technique we denote molecular set representation learning is both an alternative and an extension to graph neural network architectures for machine learning tasks on molecules, molecule complexes and chemical reactions.



中文翻译:


分子集表示学习



分子的计算表示可以采用多种形式,包括图、图的字符串编码、二进制向量或实值向量形式的学习嵌入。然后,使用各种机器学习模型将这些表示用于下游分类和回归任务。然而,现有模型存在局限性,例如需要明确定义的化学键,这通常不能代表分子的真正潜在性质。在这里,我们提出了一个基于集合表示学习的分子机器学习任务框架。我们证明,仅对原子不变量集进行学习就可以在最常用的化学基准数据集上达到最先进的基于图的模型的性能,并且将集合表示层引入到图神经网络中可以超越已建立的模型的性能化学、生物学和材料科学领域的方法。我们引入了专门的基于集合表示的神经网络架构,用于反应产量和蛋白质-配体结合亲和力预测。总的来说,我们表明,我们表示分子集表示学习的技术既是图神经网络架构的替代方案,又是其扩展,用于分子、分子复合物和化学反应的机器学习任务。

更新日期:2024-07-06
down
wechat
bug