当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust Semi-Supervised Learning by Wisely Leveraging Open-Set Data
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 5-22-2024 , DOI: 10.1109/tpami.2024.3403994
Yang Yang 1 , Nan Jiang 2 , Yi Xu 3 , De-Chuan Zhan 2

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.



开放集半监督学习(OSSL)的现实设置是,未标记的数据可能来自标记集中未见过的类,即分布外(OOD)数据,这可能会导致传统 SSL 模型的性能下降。为了解决这个问题,除了传统的分布内(ID)分类器之外,一些现有的OSSL方法还采用额外的OOD检测模块来避免OOD数据的潜在负面影响。然而,这些方法通常在训练过程中使用整套开放集数据,其中可能包含对 OSSL 任务不友好的数据,从而对模型性能产生负面影响。这激励我们为 OSSL 开发强大的开放集数据选择策略。通过从学习理论的角度进行理论理解,我们提出了 Wise Open-set Semi-supervised Learning (WiseOpen),这是一种通用的 OSSL 框架,可以选择性地利用开放集数据来训练模型。通过应用基于梯度方差的选择机制,WiseOpen 利用友好的子集而不是整个开放集数据集来增强模型的 ID 分类能力。此外,为了减少计算开销,我们还分别采用低频更新和基于损失的选择,提出了两种实用的 WiseOpen 变体。大量实验证明了 WiseOpen 与最先进技术相比的有效性。