当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heterogeneous Semantic Transfer for Multi-label Recognition with Partial Labels
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-07-15 , DOI: 10.1007/s11263-024-02127-2
Tianshui Chen , Tao Pu , Lingbo Liu , Yukai Shi , Zhijing Yang , Liang Lin

Multi-label image recognition with partial labels (MLR-PL), in which some labels are known while others are unknown for each image, may greatly reduce the cost of annotation and thus facilitate large-scale MLR. We find that strong semantic correlations exist within each image and across different images, and these correlations can help transfer the knowledge possessed by the known labels to retrieve the unknown labels and thus improve the performance of the MLR-PL task. In this work, we propose a novel heterogeneous semantic transfer (HST) framework that consists of two complementary transfer modules that explore both within-image and cross-image semantic correlations to transfer the knowledge possessed by known labels to generate pseudo labels for the unknown labels. Specifically, an intra-image semantic transfer (IST) module learns an image-specific label co-occurrence matrix for each image and maps the known labels to complement the unknown labels based on these matrices. Additionally, a cross-image transfer (CST) module learns category-specific feature-prototype similarities and then helps complement the unknown labels that have high degrees of similarity with the corresponding prototypes. It is worthy-noting that the HST framework requires searching appropriate thresholds to determine the co-occurrence and similarity scores to generate pseudo labels for the IST and CST modules, respectively. To avoid highly time-consuming and resource-intensive manual tuning, we introduce a differential threshold learning algorithm that adjusts the nondifferential indication function to a differential formulation to automatically learn the appropriate thresholds. Finally, both the known and generated pseudo labels are used to train MLR models. Extensive experiments conducted on the Microsoft COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed HST framework achieves superior performance to that of current state-of-the-art algorithms. Specifically, it obtains mean average precision (mAP) improvements of 1.4, 3.3, and 0.4% on the three datasets over the results of the best-performing previously developed algorithm.



带有部分标签的多标签图像识别(MLR-PL),其中每个图像的一些标签是已知的,而另一些标签是未知的,可以大大降低注释成本,从而有利于大规模MLR。我们发现每个图像内以及不同图像之间都存在强语义相关性,这些相关性可以帮助迁移已知标签所拥有的知识来检索未知标签,从而提高 MLR-PL 任务的性能。在这项工作中,我们提出了一种新颖的异构语义转移(HST)框架,该框架由两个互补的转移模块组成,这些模块探索图像内和图像间语义相关性,以转移已知标签所拥有的知识,从而为未知标签生成伪标签。具体来说,图像内语义转移(IST)模块学习每个图像的图像特定标签共现矩阵,并根据这些矩阵映射已知标签以补充未知标签。此外,跨图像传输(CST)模块学习特定于类别的特征原型相似性,然后帮助补充与相应原型具有高度相似性的未知标签。值得注意的是,HST 框架需要搜索适当的阈值来确定共现和相似性分数,从而分别为 IST 和 CST 模块生成伪标签。为了避免高度耗时和资源密集的手动调整,我们引入了差分阈值学习算法,该算法将非差分指示函数调整为差分公式,以自动学习适当的阈值。最后,已知的和生成的伪标签都用于训练 MLR 模型。 在 Microsoft COCO、Visual Genome 和 Pascal VOC 2007 数据集上进行的大量实验表明,所提出的 HST 框架实现了优于当前最先进算法的性能。具体来说,与先前开发的性能最佳算法的结果相比,它在三个数据集上的平均精度 (mAP) 提高了 1.4%、3.3% 和 0.4%。
