International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-12-11 , DOI: 10.1007/s11263-024-02296-0 Yulin Wang, Zanlin Ni, Yifan Pu, Cai Zhou, Jixuan Ying, Shiji Song, Gao Huang
End-to-end (E2E) training has become the de-facto standard for training modern deep networks, e.g., ConvNets and vision Transformers (ViTs). Typically, a global error signal is generated at the end of a model and back-propagated layer-by-layer to update the parameters. This paper shows that the reliance on back-propagating global errors may not be necessary for deep learning. More precisely, deep networks with a competitive or even better performance can be obtained by purely leveraging locally supervised learning, i.e., splitting a network into gradient-isolated modules and training them with local supervision signals. However, such an extension is non-trivial. Our experimental and theoretical analysis demonstrates that simply training local modules with an E2E objective tends to be short-sighted, collapsing task-relevant information at early layers, and hurting the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discarding task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. We evaluate InfoPro extensively with ConvNets and ViTs, based on twelve computer vision benchmarks organized into five tasks (i.e., image/video recognition, semantic/instance segmentation, and object detection). InfoPro exhibits superior efficiency over E2E training in terms of GPU memory footprints, convergence speed, and training data scale. Moreover, InfoPro enables the effective training of more parameter- and computation-efficient models (e.g., much deeper networks), which suffer from inferior performance when trained in E2E. Code: https://github.com/blackfeather-wang/InfoPro-Pytorch.
中文翻译:
InfoPro:通过最大化信息传播实现本地监督深度学习
端到端 (E2E) 训练已成为训练现代深度网络的事实标准,例如 ConvNet 和视觉转换器 (ViTs)。通常,在模型末尾生成全局误差信号,并逐层反向传播以更新参数。本文表明,深度学习可能不需要依赖反向传播的全局误差。更准确地说,纯粹利用局部监督学习可以获得具有竞争力甚至更好性能的深度网络,即将网络拆分为梯度隔离的模块,并使用局部监督信号对其进行训练。然而,这样的扩展并非易事。我们的实验和理论分析表明,简单地用 E2E 目标训练本地模块往往是短视的,在早期层折叠了与任务相关的信息,并损害了整个模型的性能。为了避免这个问题,我们提出了一个信息传播 (InfoPro) 损失,它鼓励本地模块保留尽可能多的有用信息,同时逐步丢弃与任务无关的信息。由于 InfoPro 损失难以以原始形式计算,因此我们推导出一个可行的上限作为替代优化目标,从而产生一种简单但有效的算法。我们根据 12 个计算机视觉基准(分为 5 个任务(即图像/视频识别、语义/实例分割和对象检测)使用 ConvNets 和 ViT 对 InfoPro 进行了广泛的评估。InfoPro 在 GPU 内存占用、收敛速度和训练数据规模方面表现出优于 E2E 训练的效率。此外,InfoPro 能够有效地训练参数和计算效率更高的模型(例如,更深的网络),在 E2E 中训练时性能较差。代码:https://github.com/blackfeather-wang/InfoPro-Pytorch。