Nature Biomedical Engineering ( IF 26.8 ) Pub Date : 2024-12-17 , DOI: 10.1038/s41551-024-01290-8 Wei Qiu, Ayse B. Dincer, Joseph D. Janizek, Safiye Celik, Mikael J. Pittet, Kamila Naxerova, Su-In Lee
Clinical and biological information in large datasets of gene expression across cancers could be tapped with unsupervised deep learning. However, difficulties associated with biological interpretability and methodological robustness have made this impractical. Here we describe an unsupervised deep-learning framework for the generation of low-dimensional latent spaces for gene-expression data from 50,211 transcriptomes across 18 human cancers. The framework, which we named DeepProfile, outperformed dimensionality-reduction methods with respect to biological interpretability and allowed us to unveil that genes that are universally important in defining latent spaces across cancer types control immune cell activation, whereas cancer-type-specific genes and pathways define molecular disease subtypes. By linking latent variables in DeepProfile to secondary characteristics of tumours, we discovered that mutation burden is closely associated with the expression of cell-cycle-related genes, and that the activity of biological pathways for DNA-mismatch repair and MHC class II antigen presentation are consistently associated with patient survival. We also found that tumour-associated macrophages are a source of survival-correlated MHC class II transcripts. Unsupervised learning can facilitate the discovery of biological insight from gene-expression data.
中文翻译:
深入分析 18 种人类癌症的基因表达
跨癌症基因表达的大型数据集中的临床和生物学信息可以通过无监督深度学习来利用。然而,与生物学可解释性和方法稳健性相关的困难使这变得不切实际。在这里,我们描述了一个无监督的深度学习框架,用于为 18 种人类癌症的 50,211 个转录组的基因表达数据生成低维潜在空间。该框架被我们命名为 DeepProfile,在生物学可解释性方面优于降维方法,并使我们能够揭示在定义跨癌症类型的潜在空间方面具有普遍重要性的基因控制免疫细胞激活,而癌症类型特异性基因和通路定义分子疾病亚型。通过将 DeepProfile 中的潜在变量与肿瘤的次要特征联系起来,我们发现突变负荷与细胞周期相关基因的表达密切相关,并且 DNA 错配修复和 MHC II 类抗原呈递的生物途径活性始终与患者生存相关。我们还发现肿瘤相关巨噬细胞是生存相关 MHC II 类转录本的来源。无监督学习可以促进从基因表达数据中发现生物学见解。