Nature Machine Intelligence ( IF 18.8 ) Pub Date : 2024-10-03 , DOI: 10.1038/s42256-024-00902-x Guruprasad Raghavan, Bahey Tharwat, Surya Narayanan Hari, Dhruvil Satani, Rex Liu, Matt Thomson
Contemporary machine learning algorithms train artificial neural networks by setting network weights to a single optimized configuration through gradient descent on task-specific training data. The resulting networks can achieve human-level performance on natural language processing, image analysis and agent-based tasks, but lack the flexibility and robustness characteristic of human intelligence. Here we introduce a differential geometry framework—functionally invariant paths—that provides flexible and continuous adaptation of trained neural networks so that secondary tasks can be achieved beyond the main machine learning goal, including increased network sparsification and adversarial robustness. We formulate the weight space of a neural network as a curved Riemannian manifold equipped with a metric tensor whose spectrum defines low-rank subspaces in weight space that accommodate network adaptation without loss of prior knowledge. We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives. With modest computational resources, the functionally invariant path algorithm achieves performance comparable with or exceeding state-of-the-art methods including low-rank adaptation on continual learning, sparsification and adversarial robustness tasks for large language models (bidirectional encoder representations from transformers), vision transformers (ViT and DeIT) and convolutional neural networks.
中文翻译:
通过遍历功能不变的路径来设计灵活的机器学习系统
现代机器学习算法通过对特定于任务的训练数据进行梯度下降,将网络权重设置为单个优化配置来训练人工神经网络。由此产生的网络可以在自然语言处理、图像分析和基于代理的任务上实现人类水平的性能,但缺乏人类智能的灵活性和稳健性。在这里,我们介绍了一个差分几何框架——功能不变的路径——它为经过训练的神经网络提供了灵活和持续的适应,以便可以在主要机器学习目标之外完成次要任务,包括增加网络稀疏化和对抗鲁棒性。我们将神经网络的权重空间表述为一个弯曲的黎曼流形,该流形配备了一个度量张量,其频谱定义了权重空间中的低秩子空间,这些子空间可以在不丢失先验知识的情况下适应网络。我们将适应形式化为在权重空间中沿测地线路径的运动,同时寻找容纳次要目标的网络。使用适度的计算资源,功能不变路径算法的性能可与最先进的方法相媲美或超过最先进的方法,包括大型语言模型(来自转换器的双向编码器表示)、视觉转换器(ViT 和 DeIT)和卷积神经网络的连续学习、稀疏化和对抗鲁棒性任务的低秩适应。