当前位置:
X-MOL 学术
›
Automatica
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Predict globally, correct locally: Parallel-in-time optimization of neural networks
Automatica ( IF 4.8 ) Pub Date : 2024-10-30 , DOI: 10.1016/j.automatica.2024.111976 Panos Parpas, Corey Muir
Automatica ( IF 4.8 ) Pub Date : 2024-10-30 , DOI: 10.1016/j.automatica.2024.111976 Panos Parpas, Corey Muir
The training of neural networks can be formulated as an optimal control problem of a dynamical system. The initial conditions of the dynamical system are given by the data. The objective of the control problem is to transform the initial conditions in a form that can be easily classified or regressed using linear methods. This link between optimal control of dynamical systems and neural networks has proved beneficial both from a theoretical and from a practical point of view. Several researchers have exploited this link to investigate the stability of different neural network architectures and develop memory efficient training algorithms. In this paper, we also adopt the dynamical systems view of neural networks, but our aim is different from earlier works. Instead, we develop a novel distributed optimization algorithm. The proposed algorithm addresses the most significant obstacle for distributed algorithms for neural network optimization: the network weights cannot be updated until the forward propagation of the data, and backward propagation of the gradients are complete. Using the dynamical systems point of view, we interpret the layers of a (residual) neural network as the discretized dynamics of a dynamical system and exploit the relationship between the co-states (adjoints) of the optimal control problem and backpropagation. We then develop a parallel-in-time method that updates the parameters of the network without waiting for the forward or back propagation algorithms to complete in full. We establish the convergence of the proposed algorithm. Preliminary numerical results suggest that the algorithm is competitive and more efficient than the state-of-the-art.
中文翻译:
全局预测,局部校正:神经网络的并行时间优化
神经网络的训练可以表述为动力系统的最优控制问题。动力系统的初始条件由数据给出。控制问题的目标是将初始条件转换为可以使用线性方法轻松分类或回归的形式。事实证明,动态系统和神经网络的最优控制之间的这种联系从理论和实践的角度来看都是有益的。一些研究人员利用这一联系来研究不同神经网络架构的稳定性,并开发内存高效的训练算法。在本文中,我们还采用了神经网络的动力学系统观点,但我们的目标与早期的工作不同。相反,我们开发了一种新的分布式优化算法。所提出的算法解决了用于神经网络优化的分布式算法的最大障碍:在数据的正向传播和梯度的向后传播完成之前,无法更新网络权重。从动力系统的角度来看,我们将(残差)神经网络的层解释为动力系统的离散动力学,并利用最优控制问题的共态(伴随)与反向传播之间的关系。然后,我们开发了一种并行时间方法,无需等待前向或后向传播算法完全完成即可更新网络的参数。我们建立了所提出算法的收敛性。初步的数值结果表明,该算法比最先进的算法更具竞争力且效率更高。
更新日期:2024-10-30
中文翻译:
全局预测,局部校正:神经网络的并行时间优化
神经网络的训练可以表述为动力系统的最优控制问题。动力系统的初始条件由数据给出。控制问题的目标是将初始条件转换为可以使用线性方法轻松分类或回归的形式。事实证明,动态系统和神经网络的最优控制之间的这种联系从理论和实践的角度来看都是有益的。一些研究人员利用这一联系来研究不同神经网络架构的稳定性,并开发内存高效的训练算法。在本文中,我们还采用了神经网络的动力学系统观点,但我们的目标与早期的工作不同。相反,我们开发了一种新的分布式优化算法。所提出的算法解决了用于神经网络优化的分布式算法的最大障碍:在数据的正向传播和梯度的向后传播完成之前,无法更新网络权重。从动力系统的角度来看,我们将(残差)神经网络的层解释为动力系统的离散动力学,并利用最优控制问题的共态(伴随)与反向传播之间的关系。然后,我们开发了一种并行时间方法,无需等待前向或后向传播算法完全完成即可更新网络的参数。我们建立了所提出算法的收敛性。初步的数值结果表明,该算法比最先进的算法更具竞争力且效率更高。