On the Generalization and Causal Explanation in Self-Supervised Learning,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Generalization and Causal Explanation in Self-Supervised Learning
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-10-19 , DOI: 10.1007/s11263-024-02263-9
Wenwen Qiang, Zeen Song, Ziyin Gu, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong

Self-supervised learning (SSL) methods learn from unlabeled data and achieve high generalization performance on downstream tasks. However, they may also suffer from overfitting to their training data and lose the ability to adapt to new tasks. To investigate this phenomenon, we conduct experiments on various SSL methods and datasets and make two observations: (1) Overfitting occurs abruptly in later layers and epochs, while generalizing features are learned in early layers for all epochs; (2) Coding rate reduction can be used as an indicator to measure the degree of overfitting in SSL models. Based on these observations, we propose Undoing Memorization Mechanism (UMM), a plug-and-play method that mitigates overfitting of the pre-trained feature extractor by aligning the feature distributions of the early and the last layers to maximize the coding rate reduction of the last layer output. The learning process of UMM is a bi-level optimization process. We provide a causal analysis of UMM to explain how UMM can help the pre-trained feature extractor overcome overfitting and recover generalization. We also demonstrate that UMM significantly improves the generalization performance of SSL methods on various downstream tasks. The source code is to be released at https://github.com/ZeenSong/UMM.

中文翻译：

关于自监督学习中的泛化和因果解释

自监督学习（SSL）方法从未标记的数据中学习，并在下游任务上实现较高的泛化性能。但是，他们也可能遭受训练数据过度拟合并失去适应新任务的能力。为了研究这种现象，我们对各种 SSL 方法和数据集进行了实验，并进行了两个观察：（1）过拟合在后层和纪元中突然发生，而泛化特征在所有纪元的早期层中被学习;（2）编码率降低可以用作衡量 SSL 模型中过拟合程度的指标。基于这些观察结果，我们提出了撤销记忆机制（UMM），这是一种即插即用的方法，通过对齐早期和最后一层的特征分布来减轻预训练特征提取器的过度拟合，以最大限度地提高最后一层输出的编码率降低。UMM 的学习过程是一个双级优化过程。我们提供了 UMM 的因果分析，以解释 UMM 如何帮助预训练的特征提取器克服过拟合并恢复泛化。我们还证明 UMM 显著提高了 SSL 方法在各种下游任务上的泛化性能。源代码将于 https://github.com/ZeenSong/UMM 发布。

更新日期：2024-10-20

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南