Neurocomputing ( IF 5.5 ) Pub Date : 2023-01-12 , DOI: 10.1016/j.neucom.2023.01.033 Jun Chen , Jianfeng Ding , Yang Yu , Wenping Gong
Infrared and visible image fusion aims to integrate complementary information from different types of images into one image. The existing image fusion methods are primarily based on convolutional neural network (CNN), which ignores long-range dependencies of images, resulting in the fusion network unable to generate images with good complementarity. Inspired by the importance of global information, we introduced the transformer technique into the CNN-based fusion network as a way to improve the entire image-level perception in complex fusion scenarios. In this paper, we propose an end-to-end image fusion framework based on transformer and hybrid feature extractor, which enables the network to focus on both global and local information, using the characteristics of transformer to compensate for the shortcomings of CNN itself. In our network, the dual-branch CNN module is used to extract the shallow features of images, and then the vision transformer module is used to obtain the global channel and spatial relationship in the features. Finally, the fusion results are obtained through the image reconstruction module. We calculate the loss in the features of different depths according to the different kinds of original images by using the pre-trained VGG19 network. The experimental results show the effectiveness of adding the vision transformer module. Compared with other traditional and deep learning methods, our method achieves state-of-the-art qualitative and quantitative experiments performance.
中文翻译:
THFuse:使用 Transformer 和混合特征提取器的红外和可见光图像融合网络
红外和可见光图像融合旨在将来自不同类型图像的互补信息整合到一幅图像中。现有的图像融合方法主要基于卷积神经网络(CNN),忽略了图像的长程依赖性,导致融合网络无法生成具有良好互补性的图像。受全局信息重要性的启发,我们将变换器技术引入基于 CNN 的融合网络中,作为在复杂融合场景中改善整个图像级感知的一种方式。在本文中,我们提出了一种基于 transformer 和混合特征提取器的端到端图像融合框架,使网络能够同时关注全局和局部信息,利用 transformer 的特性来弥补 CNN 本身的缺点。在我们的网络中,双分支CNN模块用于提取图像的浅层特征,然后使用vision transformer模块获取特征中的全局通道和空间关系。最后通过图像重建模块得到融合结果。我们使用预训练的 VGG19 网络根据不同种类的原始图像计算不同深度特征的损失。实验结果表明了加入视觉转换器模块的有效性。与其他传统和深度学习方法相比,我们的方法实现了最先进的定性和定量实验性能。最后通过图像重建模块得到融合结果。我们使用预训练的 VGG19 网络根据不同种类的原始图像计算不同深度特征的损失。实验结果表明了加入视觉转换器模块的有效性。与其他传统和深度学习方法相比,我们的方法实现了最先进的定性和定量实验性能。最后通过图像重建模块得到融合结果。我们使用预训练的 VGG19 网络根据不同种类的原始图像计算不同深度特征的损失。实验结果表明了加入视觉转换器模块的有效性。与其他传统和深度学习方法相比,我们的方法实现了最先进的定性和定量实验性能。