Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification
IEEE Transactions on Geoscience and Remote Sensing ( IF 7.5 ) Pub Date : 2024-06-26 , DOI: 10.1109/tgrs.2024.3419266
Fulin Xu ₁ , Shaohui Mei ₁ , Ge Zhang ₂ , Nan Wang ₁ , Qian Du ₃

Affiliation

Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer’s long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.

中文翻译：

通过交叉注意力融合网络桥接 CNN 和 Transformer 以进行高光谱图像分类

特征表示对于高光谱图像（HSI）分类至关重要。然而，现有的基于卷积神经网络（CNN）的方法受到卷积核的限制，仅关注局部特征，这导致其忽略了HSI的全局属性。基于 Transformer 的网络可以弥补 CNN 的局限性，因为它们强调 HSI 的全局特征。如何结合这两种网络在特征提取方面的优势对于提高分类精度具有重要意义。因此，提出了一种桥接CNN和Transformer的交叉注意力融合网络（CAF-Former），它可以充分利用CNN在局部特征和Transformer的长时间依赖特征学习方面的优势进行高光谱分类。为了充分探索 HSI 内的局部和全局信息，提出了动态 CNN 分支来有效编码像素的局部特征，同时构建高斯变换器分支来精确建模全局特征和远程依赖性。此外，为了与局部和全局特征充分交互，提出了交叉注意融合（CAF）模块作为融合两个分支提取的特征的桥梁。对多个基准数据集的实验表明，所提出的 CAF-Former 在 HSI 分类方面显着优于基于 CNN 和基于 Transformer 的最先进网络。

更新日期：2024-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>