LFDT-Fusion: A latent feature-guided diffusion Transformer model for general image fusion,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

LFDT-Fusion: A latent feature-guided diffusion Transformer model for general image fusion
Information Fusion ( IF 14.7 ) Pub Date : 2024-08-16 , DOI: 10.1016/j.inffus.2024.102639
Bo Yang , Zhaohui Jiang , Dong Pan , Haoyang Yu , Gui Gui , Weihua Gui

For image fusion tasks, it is inefficient for the diffusion model to iterate multiple times on the original resolution image for feature mapping. To address this issue, this paper proposes an efficient latent feature-guided diffusion model for general image fusion. The model consists of a pixel space autoencoder and a compact Transformer-based diffusion network. Specifically, the pixel space autoencoder is a novel UNet-based latent diffusion strategy that compresses inputs into a low-resolution latent space through downsampling. Simultaneously, skip connections transfer multiscale intermediate features from the encoder to the decoder for decoding, preserving the high-resolution information of the original input. Compared to the existing VAE-GAN-based latent diffusion strategy, the proposed UNet-based strategy is significantly more stable and generates highly detailed images without adversarial optimization. The Transformer-based diffusion network consists of a denoising network and a fusion head. The former captures long-range diffusion dependencies and learns hierarchical diffusion representations, while the latter facilitates diffusion feature interactions to comprehend complex cross-domain information. Moreover, improvements to the diffusion model in noise level, denoising steps, and sampler selection have yielded superior fusion performance across six image fusion tasks. The proposed method illustrates qualitative and quantitative advantages, as evidenced by experimental results in both public datasets and industrial environments. The code is available at: https://github.com/BOYang-pro/LFDT-Fusion.

中文翻译：

LFDT-Fusion：用于一般图像融合的潜在特征引导扩散 Transformer 模型

对于图像融合任务，扩散模型在原始分辨率图像上迭代多次进行特征映射的效率很低。为了解决这个问题，本文提出了一种用于一般图像融合的有效潜在特征引导扩散模型。该模型由像素空间自动编码器和紧凑的基于 Transformer 的扩散网络组成。具体来说，像素空间自动编码器是一种新颖的基于 UNet 的潜在扩散策略，它通过下采样将输入压缩到低分辨率的潜在空间中。同时，跳跃连接将多尺度中间特征从编码器传输到解码器进行解码，保留原始输入的高分辨率信息。与现有的基于 VAE-GAN 的潜在扩散策略相比，所提出的基于 UNet 的策略明显更加稳定，并且无需对抗性优化即可生成高度详细的图像。基于Transformer的扩散网络由去噪网络和融合头组成。前者捕获长程扩散依赖性并学习分层扩散表示，而后者促进扩散特征交互以理解复杂的跨域信息。此外，扩散模型在噪声水平、去噪步骤和采样器选择方面的改进在六个图像融合任务中产生了卓越的融合性能。公共数据集和工业环境中的实验结果证明了所提出的方法具有定性和定量优势。代码位于：https://github.com/BOYang-pro/LFDT-Fusion。

更新日期：2024-08-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11