当前位置:
X-MOL 学术
›
Annu. Rev. Stat. Appl.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2024-11-21 , DOI: 10.1146/annurev-statistics-040522-013920 Namjoon Suh, Guang Cheng
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2024-11-21 , DOI: 10.1146/annurev-statistics-040522-013920 Namjoon Suh, Guang Cheng
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.
中文翻译:
深度学习统计理论综述:近似、训练动力学和生成模型
在本文中,我们从三个角度回顾了有关神经网络统计理论的文献:近似、训练动力学和生成模型。在第一部分中,在非参数回归框架中回顾了神经网络超额风险的结果。这些结果依赖于神经网络的显式结构,导致超额风险的快速收敛速度。尽管如此,他们的基础分析仅适用于深度神经网络高度非凸环境中的全局最小化器。这促使我们在第二部分回顾神经网络的训练动态。具体来说,我们回顾了一些文章,这些文章试图回答通过基于梯度的方法训练的神经网络如何找到可以在看不见的数据上很好地泛化的解决方案的问题。特别是,回顾了两个众所周知的范式:神经切线核和均值场范式。最后,我们从两个相同的角度(近似和训练动力学)回顾了生成模型的最新理论进展,包括生成对抗网络、扩散模型和大型语言模型中的上下文学习。
更新日期:2024-11-21
中文翻译:
深度学习统计理论综述:近似、训练动力学和生成模型
在本文中,我们从三个角度回顾了有关神经网络统计理论的文献:近似、训练动力学和生成模型。在第一部分中,在非参数回归框架中回顾了神经网络超额风险的结果。这些结果依赖于神经网络的显式结构,导致超额风险的快速收敛速度。尽管如此,他们的基础分析仅适用于深度神经网络高度非凸环境中的全局最小化器。这促使我们在第二部分回顾神经网络的训练动态。具体来说,我们回顾了一些文章,这些文章试图回答通过基于梯度的方法训练的神经网络如何找到可以在看不见的数据上很好地泛化的解决方案的问题。特别是,回顾了两个众所周知的范式:神经切线核和均值场范式。最后,我们从两个相同的角度(近似和训练动力学)回顾了生成模型的最新理论进展,包括生成对抗网络、扩散模型和大型语言模型中的上下文学习。