当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contrastive hashing with vision transformer for image retrieval
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2022-09-19 , DOI: 10.1002/int.23082
Xiuxiu Ren 1, 2 , Xiangwei Zheng 1, 2 , Huiyu Zhou 3 , Weilong Liu 4 , Xiao Dong 5
Affiliation  

Hashing techniques have attracted considerable attention owing to their advantages of efficient computation and economical storage. However, it is still a challenging problem to generate more compact binary codes for promising performance. In this paper, we propose a novel contrastive vision transformer hashing method, which seamlessly integrates contrastive learning and vision transformers (ViTs) with hash technology into a well-designed model to learn informative features and compact binary codes simultaneously. First, we modify the basic contrastive learning framework by designing several hash layers to meet the specific requirement of hash learning. In our hash network, ViTs are applied as backbones for feature learning, which is rarely performed in existing hash learning methods. Then, we design a multiobjective loss function, in which contrastive loss explores discriminative features by maximizing agreement between different augmented views from the same image, similarity preservation loss performs pairwise semantic preservation to enhance the representative capabilities of hash codes, and quantization loss controls the quantitative error. Hence, we can facilitate end-to-end joint training to improve the retrieval performance. The encouraging experimental results on three widely used benchmark databases demonstrate the superiority of our algorithm compared with several state-of-the-art hashing algorithms.

中文翻译:

用于图像检索的视觉变换器的对比哈希

哈希技术由于其高效计算和经济存储的优点而引起了相当大的关注。然而,生成更紧凑的二进制代码以获得有前途的性能仍然是一个具有挑战性的问题。在本文中,我们提出了一种新颖的对比视觉转换器哈希方法,该方法将对比学习和视觉转换器 (ViT) 与哈希技术无缝集成到精心设计的模型中,以同时学习信息特征和紧凑的二进制代码。首先,我们通过设计多个哈希层来修改基本的对比学习框架,以满足哈希学习的特定要求。在我们的哈希网络中,ViT 被用作特征学习的主干,这在现有的哈希学习方法中很少执行。然后,我们设计了一个多目标损失函数,其中对比损失通过最大化来自同一图像的不同增强视图之间的一致性来探索判别特征,相似性保留损失执行成对语义保留以增强哈希码的代表性能力,量化损失控制定量误差。因此,我们可以促进端到端联合训练以提高检索性能。三个广泛使用的基准数据库的令人鼓舞的实验结果证明了我们的算法与几种最先进的哈希算法相比的优越性。量化损失控制量化误差。因此,我们可以促进端到端联合训练以提高检索性能。三个广泛使用的基准数据库的令人鼓舞的实验结果证明了我们的算法与几种最先进的哈希算法相比的优越性。量化损失控制量化误差。因此,我们可以促进端到端联合训练以提高检索性能。三个广泛使用的基准数据库的令人鼓舞的实验结果证明了我们的算法与几种最先进的哈希算法相比的优越性。
更新日期:2022-09-19
down
wechat
bug