当前位置: X-MOL 学术Adv. Comput. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Limitations of neural network training due to numerical instability of backpropagation
Advances in Computational Mathematics ( IF 1.7 ) Pub Date : 2024-02-11 , DOI: 10.1007/s10444-024-10106-x
Clemens Karner , Vladimir Kazeev , Philipp Christian Petersen

We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments which yield high order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results.



中文翻译:

由于反向传播的数值不稳定性而导致神经网络训练的局限性

我们通过梯度下降研究深度神经网络的训练,其中使用浮点运算来计算梯度。在这个框架中,在现实的假设下,我们证明,在梯度下降训练过程中,不太可能找到在层数方面保持超线性许多仿射块的 ReLU 神经网络。实际上,在所有产生高阶多项式逼近率的逼近理论论证中,都使用与层数相比具有指数级数量的仿射块ReLU 神经网络序列。因此,我们得出的结论是,实践中梯度下降产生的 ReLU 神经网络的近似序列与理论上构建的序列有很大不同。将假设和理论结果与数值研究进行比较,得出一致的结果。

更新日期:2024-02-11
down
wechat
bug