当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating the generalizability of graph neural networks for predicting collision cross section
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2024-08-29 , DOI: 10.1186/s13321-024-00899-w
Chloe Engler Hart 1 , António José Preto 1 , Shaurya Chanana 1 , David Healey 1 , Tobias Kind 1 , Daniel Domingo-Fernández 1
Affiliation  

Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development of accurate machine learning (ML) models for in silico predictions. In this study, we evaluated state-of-the-art Graph Neural Networks (GNNs), trained to predict CCS values using the largest publicly available dataset to date. Although our results confirm the high accuracy of these models within chemical spaces similar to their training environments, their performance significantly declines when applied to structurally novel regions. This discrepancy raises concerns about the reliability of in silico CCS predictions and underscores the need for releasing further publicly available CCS datasets. To mitigate this, we introduce Mol2CCS which demonstrates how generalization can be partially improved by extending models to account for additional features such as molecular fingerprints, descriptors, and the molecule types. Lastly, we also show how confidence models can support by enhancing the reliability of the CCS estimates. Scientific contribution We have benchmarked state-of-the-art graph neural networks for predicting collision cross section. Our work highlights the accuracy of these models when trained and predicted in similar chemical spaces, but also how their accuracy drops when evaluated in structurally novel regions. Lastly, we conclude by presenting potential approaches to mitigate this issue.

中文翻译:


评估图神经网络预测碰撞横截面的通用性



离子淌度与质谱 (IM-MS) 相结合是一种很有前途的分析技术,它通过测量碰撞截面 (CCS) 值来增强分子表征,碰撞截面 (CCS) 值表示分子的大小和形状。然而,CCS 值在结构分析中的有效应用仍然受到实验数据有限的限制,因此需要开发精确的机器学习 (ML) 模型进行计算机预测。在这项研究中,我们评估了最先进的图神经网络 (GNN),该网络经过训练可使用迄今为止最大的公开可用数据集来预测 CCS 值。尽管我们的结果证实了这些模型在类似于训练环境的化学空间中具有高精度,但当应用于结构新颖的区域时,它们的性能显着下降。这种差异引起了人们对计算机 CCS 预测可靠性的担忧,并强调需要发布更多公开可用的 CCS 数据集。为了缓解这个问题,我们引入了 Mol2CCS,它演示了如何通过扩展模型来考虑分子指纹、描述符和分子类型等附加特征来部分改进泛化能力。最后,我们还展示了置信模型如何通过增强 CCS 估计的可靠性来提供支持。科学贡献 我们对最先进的图神经网络进行了基准测试,用于预测碰撞横截面。我们的工作强调了这些模型在类似化学空间中训练和预测时的准确性,以及在结构新颖的区域中评估时它们的准确性如何下降。最后,我们提出了缓解这一问题的潜在方法。
更新日期:2024-08-29
down
wechat
bug