Nature Biomedical Engineering ( IF 26.8 ) Pub Date : 2024-10-15 , DOI: 10.1038/s41551-024-01268-6 Rui Yan, Md Tauhidual Islam, Lei Xing
Tabular data—rows of samples and columns of sample features—are ubiquitously used across disciplines. Yet the tabular representation makes it difficult to discover underlying associations in the data and thus hinders their analysis and the discovery of useful patterns. Here we report a broadly applicable strategy for unravelling intertwined relationships in tabular data by reconfiguring each data sample into a spatially semantic 2D topographic map, which we refer to as TabMap. A TabMap preserves the original feature values as pixel intensities, with the relationships among the features spatially encoded in the map (the strength of two inter-related features correlates with their distance on the map). TabMap makes it possible to apply 2D convolutional neural networks to extract association patterns in the data to aid data analysis, and offers interpretability by ranking features according to importance. We show the superior predictive performance of TabMap by applying it to 12 datasets across a wide range of biomedical applications, including disease diagnosis, human activity recognition, microbial identification and the analysis of quantitative structure–activity relationships.
中文翻译:
通过空间语义地形图对表格数据中的模式进行可解释的发现
表格数据(样本行和样本特征列)在各个学科中普遍使用。然而,表格表示使得很难发现数据中的潜在关联,从而阻碍了它们的分析和有用模式的发现。在这里,我们报告了一种广泛适用的策略,通过将每个数据样本重新配置为空间语义 2D 地形图(我们称之为 TabMap)来解开表格数据中交织的关系。TabMap 将原始特征值保留为像素强度,并在地图中对特征之间的关系进行空间编码(两个相互关联的特征的强度与它们在地图上的距离相关)。TabMap 可以应用 2D 卷积神经网络来提取数据中的关联模式以辅助数据分析,并通过根据重要性对特征进行排名来提供可解释性。我们通过将 TabMap 应用于各种生物医学应用的 12 个数据集,包括疾病诊断、人类活动识别、微生物鉴定和定量构效关系分析,展示了 TabMap 的卓越预测性能。