Nature Biotechnology ( IF 33.1 ) Pub Date : 2024-11-26 , DOI: 10.1038/s41587-024-02463-1 Zhaojun Zhang, Divij Mathew, Tristan L. Lim, Kaishu Mason, Clara Morral Martinez, Sijia Huang, E. John Wherry, Katalin Susztak, Andy J. Minn, Zongming Ma, Nancy R. Zhang
Data integration to align cells across batches has become a cornerstone of single-cell data analysis, critically affecting downstream results. Currently, there are no guidelines for when the biological differences between samples are separable from batch effects. Here we show that current paradigms for single-cell data integration remove biologically meaningful variation and introduce distortion. We present a statistical model and computationally scalable algorithm, CellANOVA (cell state space analysis of variance), that harnesses experimental design to explicitly recover biological signals that are erased during single-cell data integration. CellANOVA uses a ‘pool-of-controls’ design concept, applicable across diverse settings, to separate unwanted variation from biological variation of interest and allow the recovery of subtle biological signals. We apply CellANOVA to diverse contexts and validate the recovered biological signals by orthogonal assays. In particular, we show that CellANOVA is effective in the challenging case of single-cell and single-nucleus data integration, where it recovers subtle biological signals that can be validated and replicated by external data.
中文翻译:
使用 CellANOVA 回收单细胞批量整合中丢失的生物信号
通过数据集成来对齐不同批次的细胞已成为单细胞数据分析的基石,对下游结果产生了重大影响。目前,没有关于样品之间的生物学差异何时可以与批次效应分开的指南。在这里,我们表明当前单细胞数据整合的范式消除了具有生物学意义的变异并引入了失真。我们提出了一种统计模型和计算可扩展的算法 CellANOVA(细胞状态空间方差分析),它利用实验设计来显式恢复在单细胞数据集成过程中被擦除的生物信号。CellANOVA 使用适用于不同环境的“对照池”设计理念,将不需要的变异与感兴趣的生物变异分开,并允许恢复细微的生物信号。我们将 CellANOVA 应用于不同的环境,并通过正交分析验证回收的生物信号。特别是,我们表明 CellANOVA 在单细胞和单核数据集成的具有挑战性的情况下是有效的,它可以恢复可以通过外部数据验证和复制的细微生物信号。