当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection
Information Fusion ( IF 14.7 ) Pub Date : 2024-11-23 , DOI: 10.1016/j.inffus.2024.102806
Genji Yuan, Jintao Song, Jinjiang Li

Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.

中文翻译:


IF-USOD:用于水下显著目标检测的多模态信息融合交互式特征增强架构



水下突出目标检测 (USOD) 因其在各种水下视觉任务中的卓越性能而受到越来越多的关注。尽管人们对 USOD 的兴趣日益浓厚,但对 USOD 的研究仍处于起步阶段,现有方法通常难以捕捉显著物体的远程上下文特征。此外,这些方法经常忽视多模态信息的互补性。多模态信息融合可以使以前无法辨别的对象更容易被检测到,因为从不同的源图像中捕获互补特征可以更准确地描述对象。在这项工作中,我们探索了一种集成 RGB 和深度信息的创新方法,并结合交互式功能增强,以推进对水下突出物体的检测。我们的方法首先利用 transformer 和 convolutional neural network 架构的优势从源图像中提取特征。在这里,我们采用了一个两阶段的训练策略,旨在优化特征融合。随后,我们利用自我注意和交叉注意力机制对提取的特征之间的相关性进行建模,从而放大相关特征。最后,为了充分利用不同网络层的特征,我们引入了一种跨尺度学习策略来促进多尺度特征融合,通过生成粗略和精细的显著性预测来提高水下显著目标的检测精度。广泛的实验评估证明了我们提出的方法的最先进的模型性能。
更新日期:2024-11-23
down
wechat
bug