当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios
Genome Biology ( IF 10.1 ) Pub Date : 2024-06-03 , DOI: 10.1186/s13059-024-03290-y
Hongrui Duo 1 , Yinghong Li 2 , Yang Lan 3 , Jingxin Tao 1 , Qingxia Yang 4 , Yingxue Xiao 1 , Jing Sun 1 , Lei Li 1 , Xiner Nie 5 , Xiaoxi Zhang 1 , Guizhao Liang 5 , Mingwei Liu 6 , Youjin Hao 1 , Bo Li 1
Affiliation  

Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.

中文翻译:


多场景下单细胞和空间分辨转录组数据模拟的系统评估和实用指南



单细胞 RNA 测序 (scRNA-seq) 和空间分辨转录组学 (SRT) 为生命科学带来了突破性进展。为了开发 scRNA-seq 和 SRT 数据的生物信息学工具并执行无偏基准,数据模拟通过提供明确的地面实况和生成定制数据集而被广泛采用。然而,多种场景下模拟方法的性能尚未得到全面评估,使得在没有实际指导的情况下选择合适的方法具有挑战性。我们使用来自 24 个平台的 152 个参考数据集,系统地评估了为 scRNA-seq 和/或 SRT 数据开发的 49 种模拟方法的准确性、功能性、可扩展性和可用性。 SRTsim、scDesign3、ZINB-WaVE 和 scDesign2 在各个平台上都具有最佳的精度性能。出乎意料的是,一些针对 scRNA-seq 数据定制的方法具有模拟 SRT 数据的潜在兼容性。 Lun、SPARSim 和 scDesign3-tree 在相应的模拟场景下优于其他方法。 Phenopath、Lun、Simple 和 MFA 产生高可扩展性分数,但它们无法生成真实的模拟数据。用户在做出决策时应考虑方法准确性和可扩展性(或功能)之间的权衡。此外,执行错误主要是由参数估计失败以及计算中出现缺失值或无限值引起的。我们提供方法选择的实用指南、标准管道 Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ) 和在线工具 Simsite ( https:// www.ciblab.net/software/simshiny/ )用于数据模拟。 没有一种方法在所有标准上都表现最好,因此,如果一种方法可以有效且合理地解决问题,那么推荐一种虽好但不是最好的方法。我们的全面工作为开发人员提供了基因表达数据建模的重要见解,并促进了用户的模拟过程。
更新日期:2024-06-03
down
wechat
bug