当前位置: X-MOL 学术Annu. Rev. Stat. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Computing and Inference for Big Data
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2023-11-17 , DOI: 10.1146/annurev-statistics-040522-021241
Ling Zhou 1 , Ziyang Gong 1 , Pengcheng Xiang 1
Affiliation  

Data are distributed across different sites due to computing facility limitations or data privacy considerations. Conventional centralized methods—those in which all datasets are stored and processed in a central computing facility—are not applicable in practice. Therefore, it has become necessary to develop distributed learning approaches that have good inference or predictive accuracy while remaining free of individual data or obeying policies and regulations to protect privacy. In this article, we introduce the basic idea of distributed learning and conduct a selected review on various distributed learning methods, which are categorized by their statistical accuracy, computational efficiency, heterogeneity, and privacy. This categorization can help evaluate newly proposed methods from different aspects. Moreover, we provide up-to-date descriptions of the existing theoretical results that cover statistical equivalency and computational efficiency under different statistical learning frameworks. Finally, we provide existing software implementations and benchmark datasets, and we discuss future research opportunities.

中文翻译:


面向大数据的分布式计算和推理



由于计算设施限制或数据隐私考虑,数据分布在不同的站点上。传统的集中式方法(即所有数据集都在中央计算设施中存储和处理的方法)在实践中不适用。因此,有必要开发分布式学习方法,这些方法具有良好的推理或预测准确性,同时保持个人数据或遵守保护和隐私的政策和法规。在本文中,我们介绍了分布式学习的基本思想,并对各种分布式学习方法进行了精选的回顾,这些方法按统计准确性、计算效率、异构性和隐私性进行分类。这种分类可以帮助从不同方面评估新提出的方法。此外,我们还提供了现有理论结果的最新描述,这些结果涵盖了不同统计学习框架下的统计等效性和计算效率。最后,我们提供现有的软件实现和基准数据集,并讨论未来的研究机会。
更新日期:2023-11-17
down
wechat
bug