当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating generalizability of artificial intelligence models for molecular datasets
Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-12-06 , DOI: 10.1038/s42256-024-00931-6
Yasha Ektefaie, Andrew Shen, Daria Bykova, Maximillian G. Marin, Marinka Zitnik, Maha Farhat

Deep learning has made rapid advances in modelling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata- or sequence similarity-based train and test splits of input data before assessing model performance. Here we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, that is, similarity between train and test splits. We introduce SPECTRA, the spectral framework for model evaluation. Given a model and a dataset, SPECTRA plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We use SPECTRA with 18 sequencing datasets and phenotypes ranging from antibiotic resistance in tuberculosis to protein–ligand binding and evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models and convolutional neural networks. We show that sequence similarity- and metadata-based splits provide an incomplete assessment of model generalizability. Using SPECTRA, we find that as cross-split overlap decreases, deep learning models consistently show reduced performance, varying by task and model. Although no model consistently achieved the highest performance across all tasks, deep learning models can, in some cases, generalize to previously unseen sequences on specific tasks. SPECTRA advances our understanding of how foundation models generalize in biological applications.



中文翻译:


评估人工智能模型在分子数据集中的泛化性



深度学习在分子测序数据建模方面取得了快速进展。尽管在基准测试中取得了很高的性能,但目前尚不清楚深度学习模型在多大程度上学习了一般原理并推广到以前从未见过的序列。传统上,基准测试通过在评估模型性能之前生成基于元数据或序列相似性的输入数据的训练和测试拆分来询问模型泛化性。在这里,我们表明,这种方法没有考虑交叉拆分重叠的全部范围,即训练和测试拆分之间的相似性,从而错误地描述了模型的泛化性。我们介绍了 SPECTRA,这是用于模型评估的光谱框架。给定一个模型和一个数据集,SPECTRA 将模型性能绘制为减少交叉拆分重叠的函数,并报告该曲线下的面积作为泛化性的度量。我们将 SPECTRA 与 18 个测序数据集和表型一起使用,从结核病中的抗生素耐药性到蛋白质-配体结合,并评估 19 种最先进的深度学习模型的泛化性,包括大型语言模型、图形神经网络、扩散模型和卷积神经网络。我们表明,基于序列相似性和元数据的拆分提供了对模型泛化性的不完整评估。使用 SPECTRA,我们发现,随着交叉拆分重叠的减少,深度学习模型的性能始终降低,具体情况因任务和模型而异。尽管没有模型在所有任务中始终如一地实现最高性能,但在某些情况下,深度学习模型可以推广到特定任务上以前看不见的序列。SPECTRA 促进了我们对基础模型如何在生物应用中泛化的理解。

更新日期:2024-12-06
down
wechat
bug