On Efficient Training of Large-Scale Deep Learning Models,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On Efficient Training of Large-Scale Deep Learning Models
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2024-10-11 , DOI: 10.1145/3700439
Li Shen, Yan Sun, Zhiyuan Yu, Liang Ding, Xinmei Tian, Dacheng Tao

The field of deep learning has witnessed significant progress in recent times, particularly in areas such as computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. However, it extremely suffers from the unstable training process and stringent requirements of computational resources. With the increasing demands on the adaption of computational capacity, though numerous studies have explored the efficient training field to a certain extent, a comprehensive summarization/guideline on those general acceleration techniques of training large-scale deep learning models is still much anticipated. In this survey, we present a detailed review of the general techniques for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) “data-centric”: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) “model-centric”, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters and providing better initialization; (3) “optimization-centric”, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) “budgeted training”, including some distinctive acceleration methods on source-constrained situations, e.g. for limitation on the total iterations; (5) “system-centric”, including some efficient distributed frameworks and open-source libraries which provide adequate hardware support for the implementation of above mentioned acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction. Meanwhile, we further provide a detailed analysis and discussion of future works on the development of general acceleration techniques, which could inspire us to re-think and design novel efficient paradigms. Overall, we hope that this survey will serve as a valuable guideline for general efficient training.

中文翻译：

关于大规模深度学习模型的高效训练

深度学习领域近年来取得了重大进展，尤其是在计算机视觉（CV）、自然语言处理（NLP）和语音等领域。使用基于大量数据训练的大规模模型，在实际应用、提高工业生产力和促进社会发展方面具有巨大的前景。然而，它极度受到训练过程不稳定和对计算资源要求严格的困扰。随着对计算能力适应的要求越来越高，尽管大量研究在一定程度上探索了高效训练领域，但对训练大规模深度学习模型的通用加速技术的全面总结/指导仍然备受期待。在这项调查中，我们详细回顾了训练加速的一般技术。我们考虑了基本更新公式，并将其基本组成部分分为五个主要方面：（1） “以数据为中心”：包括数据集正则化、数据采样和以数据为中心的课程学习技术，可以显著降低数据样本的计算复杂度;（2） “以模型为中心”，包括基础模块加速、压缩训练、模型初始化和以模型为中心的课程学习技术，其重点是通过减少参数计算和提供更好的初始化来加速训练;（3）“以优化为中心”，包括学习率的选择、大批量的采用、高效目标的设计、模型平均技术等，注重大规模模型的训练策略和提高通用性;（4） “预算训练”，包括一些针对源受限情况的独特加速方法，例如限制总迭代次数;（5） “以系统为中心”，包括一些高效的分布式框架和开源库，为上述加速算法的实现提供足够的硬件支持。通过提出这种全面的分类法，我们的调查进行了全面的回顾，以了解每个组成部分内的一般机制及其联合相互作用。同时，我们进一步对通用加速技术发展的未来工作进行了详细的分析和讨论，这可能会激发我们重新思考和设计新颖的高效范式。总的来说，我们希望这项调查能成为一般高效培训的宝贵指南。

更新日期：2024-10-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南