传统的类不平衡学习算法需要标记训练数据,而半监督学习算法假设类分布是平衡的。然而,类不平衡和标签数据不足的问题在实际应用中经常并存。目前,大多数现有的类不平衡半监督学习方法分别解决这两个问题,导致训练模型偏向于具有更多数据样本的多数类。在这项研究中,我们提出了一种基于深度度量学习的伪标记 (DML-PL) 框架,该框架可以同时解决类不平衡半监督学习的这两个问题。拟议的 DML-PL 框架包括三个模块:深度度量学习、伪标签和网络微调。迭代自训练策略用于多次训练模型。对于每次训练,深度度量学习都会训练一个深度度量网络来学习标记和未标记数据的紧凑特征表示。伪标签然后通过标记数据聚类和最近邻选择为未标记数据生成可靠的伪标签。最后,Network Fine-tuning 对深度度量网络进行微调,以在后续训练中生成更好的伪标签。当所有未标记的数据都被伪标记时,训练结束。与基线模型相比,所提出的框架在长尾 CIFAR-10、CIFAR-100 和 ImageNet127 基准数据集上实现了最先进的性能。深度度量学习训练一个深度度量网络来学习标记和未标记数据的紧凑特征表示。伪标签然后通过标记数据聚类和最近邻选择为未标记数据生成可靠的伪标签。最后,Network Fine-tuning 对深度度量网络进行微调,以在后续训练中生成更好的伪标签。当所有未标记的数据都被伪标记时,训练结束。与基线模型相比,所提出的框架在长尾 CIFAR-10、CIFAR-100 和 ImageNet127 基准数据集上实现了最先进的性能。深度度量学习训练一个深度度量网络来学习标记和未标记数据的紧凑特征表示。伪标签然后通过标记数据聚类和最近邻选择为未标记数据生成可靠的伪标签。最后,Network Fine-tuning 对深度度量网络进行微调,以在后续训练中生成更好的伪标签。当所有未标记的数据都被伪标记时,训练结束。与基线模型相比,所提出的框架在长尾 CIFAR-10、CIFAR-100 和 ImageNet127 基准数据集上实现了最先进的性能。Network Fine-tuning 对深度度量网络进行微调,以在后续训练中生成更好的伪标签。当所有未标记的数据都被伪标记时,训练结束。与基线模型相比,所提出的框架在长尾 CIFAR-10、CIFAR-100 和 ImageNet127 基准数据集上实现了最先进的性能。Network Fine-tuning 对深度度量网络进行微调,以在后续训练中生成更好的伪标签。当所有未标记的数据都被伪标记时,训练结束。与基线模型相比,所提出的框架在长尾 CIFAR-10、CIFAR-100 和 ImageNet127 基准数据集上实现了最先进的性能。
"点击查看英文标题和摘要"
DML-PL: Deep metric learning based pseudo-labeling framework for class imbalanced semi-supervised learning
Traditional class imbalanced learning algorithms require training data to be labeled, whereas semi-supervised learning algorithms assume that the class distribution is balanced. However, class imbalance and insufficient labeled data problems often coexist in practical real-world applications. Currently, most existing class-imbalanced semi-supervised learning methods tackle these two problems separately, resulting in the trained model biased towards majority classes that have more data samples. In this study, we propose a deep metric learning based pseudo-labeling (DML-PL) framework that tackles both problems simultaneously for class-imbalanced semi-supervised learning. The proposed DML-PL framework comprises three modules: Deep Metric Learning, Pseudo-Labeling and Network Fine-tuning. An iterative self-training strategy is used to train the model multiple times. For each time of training, Deep Metric Learning trains a deep metric network to learn compact feature representations of labeled and unlabeled data. Pseudo-Labeling then generates reliable pseudo-labels for unlabeled data through labeled data clustering with nearest neighbors selection. Finally, Network Fine-tuning fine-tunes the deep metric network to generate better pseudo-labels in the subsequent training. The training ends when all the unlabeled data are pseudo-labeled. The proposed framework achieved state-of-the-art performance on the long-tailed CIFAR-10, CIFAR-100, and ImageNet127 benchmark datasets compared with baseline models.