Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2024-10-21 , DOI: 10.1109/tip.2024.3480701
Shengsheng Qian, Dizhan Xue, Jun Hu, Huaiwen Zhang, Changsheng Xu

With the increasing availability of multi-view data, multi-view representation learning has emerged as a prominent research area. However, collecting strictly view-aligned data is usually expensive, and learning from both aligned and unaligned data can be more practicable. Therefore, Partially View-aligned Representation Learning (PVRL) has recently attracted increasing attention. After aligning multi-view representations based on their semantic similarity, the aligned representations can be utilized to facilitate downstream tasks, such as clustering. However, existing methods may be constrained by the following limitations: 1) They learn semantic relations across views using the known correspondences, which is incomplete and the existence of false negative pairs (FNP) can significantly impact the learning effectiveness; 2) Existing strategies for alleviating the impact of FNP are too intuitive and lack a theoretical explanation of their applicable conditions; 3) They attempt to find FNP based on distance in the common space and fail to explore semantic relations between multi-view data. In this paper, we propose a Nonparametric Clustering-guided Cross-view Contrastive Learning (NC3L) for PVRL, in order to address the above issues. Firstly, we propose to estimate the similarity matrix between multi-view data in the marginal cross-view contrastive loss to approximate the similarity matrix of supervised contrastive learning (CL). Secondly, we establish the theoretical foundation for our proposed method by analyzing the error bounds of the loss function and its derivatives between our method and supervised CL. Thirdly, we propose a Deep Variational Nonparametric Clustering (DeepVNC) by designing a deep reparameterized variational inference for Dirichlet process Gaussian mixture models to construct cluster-level similarity between multi-view data and discover FNP. Additionally, we propose a reparameterization trick to improve the robustness and the performance of our proposed CL method. Extensive experiments on four widely used benchmark datasets show the superiority of our proposed method compared with state-of-the-art methods.

中文翻译：

用于部分视图对齐表示学习的非参数聚类引导的交叉视图对比学习

随着多视图数据可用性的增加，多视图表示学习已成为一个突出的研究领域。但是，收集严格视图对齐的数据通常成本高昂，并且从对齐和未对齐的数据中学习可能更实用。因此，部分视图对齐表示学习（PVRL）最近引起了越来越多的关注。在根据语义相似性对齐多视图表示后，可以利用对齐的表示来促进下游任务，例如聚类。然而，现有的方法可能会受到以下限制的约束：1）他们使用已知的对应关系学习跨视图的语义关系，这是不完整的，并且假负对（FNP）的存在会显着影响学习效果;2）现有的减轻 FNP 影响的策略过于直观，缺乏对其适用条件的理论解释;3）他们试图在公共空间中根据距离找到 FNP，但未能探索多视图数据之间的语义关系。在本文中，我们提出了一种用于 PVRL 的非参数聚类引导的交叉视图对比学习（NC3L），以解决上述问题。首先，我们建议在边际交叉视图对比损失中估计多视图数据之间的相似性矩阵，以近似监督对比学习（CL）的相似性矩阵。其次，我们通过分析损失函数及其导数的误差边界在我们的方法和监督 CL 之间建立了我们提出的方法的理论基础。第三，我们通过为狄利克雷过程高斯混合模型设计深度再参数化变分推理来构建多视图数据之间的聚类级相似性并发现 FNP，从而提出了深度变分非参数聚类（DeepVNC）。此外，我们提出了一种重新参数化技巧，以提高我们提出的 CL 方法的稳健性和性能。对四个广泛使用的基准数据集进行的广泛实验表明，与最先进的方法相比，我们提出的方法具有优越性。

更新日期：2024-10-21

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南