<br>基于 PID 控制器的深度神经网络随机优化加速,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

基于 PID 控制器的深度神经网络随机优化加速
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2020-01-28 , DOI: 10.1109/tnnls.2019.2963066
Haoqian Wang , Yi Luo , Wangpeng An , Qingyun Sun , Jun Xu , Lei Zhang

深度神经网络 (DNN) 被广泛使用，并在计算机视觉和模式识别等许多应用中展示了其强大功能。然而，这些网络的训练可能非常耗时。通过使用高效的优化器可以缓解这样的问题。作为最常用的优化器之一，随机梯度下降动量（SGD-M）使用过去和现在的梯度进行参数更新。然而，在网络训练过程中，SGD-M可能会遇到一些弊端，例如超调现象。这个问题会减慢训练收敛速度。为了缓解这个问题并加速 DNN 优化的收敛，我们提出了比例积分微分（PID）方法。具体来说，我们首先研究基于 PID 的控制器和 SGD-M 之间的内在关系。我们进一步提出了一种基于 PID 的优化算法来更新网络参数，其中利用了过去、当前和梯度的变化。因此，我们提出的基于 PID 的优化缓解了 SGD-M 所遭受的超调问题。在流行的 DNN 架构上进行测试时，它还获得了高达 50% 的加速以及具有竞争力的准确度。有关计算机视觉和自然语言处理的大量实验证明了我们的方法在基准数据集（包括 CIFAR10、CIFAR100、Tiny-ImageNet 和 PTB）上的有效性。我们已在 https://github.com/tensorboy/PIDOptimizer 发布了代码。

"点击查看英文标题和摘要"

PID Controller-Based Stochastic Optimization Acceleration for Deep Neural Networks

Deep neural networks (DNNs) are widely used and demonstrated their power in many applications, such as computer vision and pattern recognition. However, the training of these networks can be time consuming. Such a problem could be alleviated by using efficient optimizers. As one of the most commonly used optimizers, stochastic gradient descent-momentum (SGD-M) uses past and present gradients for parameter updates. However, in the process of network training, SGD-M may encounter some drawbacks, such as the overshoot phenomenon. This problem would slow the training convergence. To alleviate this problem and accelerate the convergence of DNN optimization, we propose a proportional-integral-derivative (PID) approach. Specifically, we investigate the intrinsic relationships between the PID-based controller and SGD-M first. We further propose a PID-based optimization algorithm to update the network parameters, where the past, current, and change of gradients are exploited. Consequently, our proposed PID-based optimization alleviates the overshoot problem suffered by SGD-M. When tested on popular DNN architectures, it also obtains up to 50% acceleration with competitive accuracy. Extensive experiments about computer vision and natural language processing demonstrate the effectiveness of our method on benchmark data sets, including CIFAR10, CIFAR100, Tiny-ImageNet, and PTB. We have released the code at https://github.com/tensorboy/PIDOptimizer.

更新日期：2020-01-28

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南