当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical Graph Interaction Transformer With Dynamic Token Clustering for Camouflaged Object Detection
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2024-10-15 , DOI: 10.1109/tip.2024.3475219
Siyuan Yao, Hao Sun, Tian-Zhu Xiang, Xiao Wang, Xiaochun Cao

Camouflaged object detection (COD) aims to identify the objects that seamlessly blend into the surrounding backgrounds. Due to the intrinsic similarity between the camouflaged objects and the background region, it is extremely challenging to precisely distinguish the camouflaged objects by existing approaches. In this paper, we propose a hierarchical graph interaction network termed HGINet for camouflaged object detection, which is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features. Specifically, we first design a region-aware token focusing attention (RTFA) with dynamic token clustering to excavate the potentially distinguishable tokens in the local region. Afterwards, a hierarchical graph interaction transformer (HGIT) is proposed to construct bi-directional aligned communication between hierarchical features in the latent interaction space for visual semantics enhancement. Furthermore, we propose a decoder network with confidence aggregated feature fusion (CAFF) modules, which progressively fuses the hierarchical interacted features to refine the local detail in ambiguous regions. Extensive experiments conducted on the prevalent datasets, i.e. COD10K, CAMO, NC4K and CHAMELEON demonstrate the superior performance of HGINet compared to existing state-of-the-art methods. Our code is available at https://github.com/Garyson1204/HGINet .

中文翻译:


具有动态标记聚类的分层图交互转换器,用于伪装对象检测



伪装对象检测 (COD) 旨在识别无缝融入周围背景的对象。由于伪装物体与背景区域之间存在内在的相似性,因此通过现有方法精确区分伪装物体极具挑战性。在本文中,我们提出了一种称为 HGINet 的分层图交互网络,用于伪装目标检测,它能够通过分层标记化特征之间的有效图交互来发现难以察觉的目标。具体来说,我们首先设计了一个具有动态令牌聚类的区域感知令牌聚焦注意力 (RTFA),以挖掘本地区域中可能可区分的令牌。然后,提出了一种分层图交互转换器 (HGIT),用于构建潜在交互空间中分层特征之间的双向对齐通信,以增强视觉语义。此外,我们提出了一种具有置信度聚合特征融合 (CAFF) 模块的解码器网络,它逐步融合分层交互特征以细化模糊区域中的局部细节。在流行的数据集(即 COD10K、CAMO、NC4K 和 CHAMELEON)上进行的广泛实验表明,与现有的最先进方法相比,HGINet 的性能更优越。我们的代码可在 https://github.com/Garyson1204/HGINet 上找到。
更新日期:2024-10-15
down
wechat
bug