A novel iteration scheme with conjugate gradient for faster pruning on transformer models,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A novel iteration scheme with conjugate gradient for faster pruning on transformer models
Complex & Intelligent Systems ( IF 5.0 ) Pub Date : 2024-08-07 , DOI: 10.1007/s40747-024-01595-w
Jun Li , Yuchen Zhu , Kexue Sun

Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant challenge in optimizing these models for more efficient deployment. To be concrete, the existing post-training pruning frameworks of transformer models suffer from inefficiencies in the crucial stage of pruning accuracy recovery, which impacts the overall pruning efficiency. To address this issue, this paper introduces a novel and efficient iteration scheme with conjugate gradient in the pruning recovery stage. By constructing a series of conjugate iterative directions, this approach ensures each optimization step is orthogonal to the previous ones, which effectively reduces redundant explorations of the search space. Consequently, each iteration progresses effectively towards the global optimum, thereby significantly enhancing search efficiency. The conjugate gradient-based faster-pruner reduces the time expenditure of the pruning process while maintaining accuracy, demonstrating a high degree of solution stability and exceptional model acceleration effects. In pruning experiments conducted on the BERT_BASE and DistilBERT models, the faster-pruner exhibited outstanding performance on the GLUE benchmark dataset, achieving a reduction of up to 36.27% in pruning time and a speed increase of up to 1.45× on an RTX 3090 GPU.

中文翻译：

一种具有共轭梯度的新颖迭代方案，用于更快地修剪变压器模型

基于 Transformer 架构的预训练模型由于其卓越的性能和跨多个技术领域的广泛适用性，在自然语言处理 (NLP) 领域具有显着的先进研究。尽管有这些优势，但优化这些模型以实现更高效的部署仍面临重大挑战。具体来说，现有的Transformer模型训练后剪枝框架在剪枝精度恢复的关键阶段效率低下，影响了整体剪枝效率。为了解决这个问题，本文在剪枝恢复阶段引入了一种新颖且高效的共轭梯度迭代方案。通过构造一系列共轭迭代方向，该方法确保每个优化步骤都与之前的优化步骤正交，从而有效减少了搜索空间的冗余探索。因此，每次迭代都有效地朝着全局最优方向前进，从而显着提高搜索效率。基于共轭梯度的快速剪枝器在保持精度的同时减少了剪枝过程的时间消耗，展示了高度的解稳定性和卓越的模型加速效果。在 BERT _BASE和 DistilBERT 模型上进行的剪枝实验中，更快的剪枝器在 GLUE 基准数据集上表现出了出色的性能，在 RTX 3090 GPU 上实现了剪枝时间最多减少 36.27% 的速度提升高达 1.45 倍。

更新日期：2024-08-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南