当前位置: X-MOL 学术IEEE Trans. Geosci. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Foundation Model-Based Spectral–Spatial Transformer for Hyperspectral Image Classification
IEEE Transactions on Geoscience and Remote Sensing ( IF 7.5 ) Pub Date : 2024-09-11 , DOI: 10.1109/tgrs.2024.3456129
Lingbo Huang 1 , Yushi Chen 1 , Xin He 1
Affiliation  

Recently, deep learning models have dominated hyperspectral image (HSI) classification. Nowadays, deep learning is undergoing a paradigm shift with the rise of transformer-based foundation models. In this study, the potential of transformer-based foundation models, including the vision foundation model (VFM) and language foundation model (LFM), for HSI classification are investigated. First, to improve the performance of traditional HSI classification tasks, a spectral-spatial VFM-based transformer (SS-VFMT) is proposed, which inserts spectral-spatial information into the pretrained foundation transformer. Specifically, a given pretrained transformer receives HSI patch tokens for long-range feature extraction benefiting from the prelearned weights. Meanwhile, two enhancement modules, i.e., spatial and spectral enhancement modules (SpaEMs $\backslash $ SpeEMs), utilize spectral and spatial information for steering the behavior of the transformer. Besides, an additional patch relationship distillation strategy is designed for SS-VFMT to exploit the pretrained knowledge better, leading to the proposed SS-VFMT-D. Second, based on SS-VFMT, to address a new HSI classification task, i.e., generalized zero-shot classification, a spectral-spatial vision-language-based transformer (SS-VLFMT) is proposed. This task is to recognize novel classes not seen during training, which is more meaningful as the real world is usually open. The SS-VLFMT leverages SS-VFMT to extract spectral-spatial features and corresponding hash codes while integrating a pretrained language model to extract text features from class names. Experimental results on HSI datasets reveal that the proposed methods are competitive compared to the state-of-the-art methods. Moreover, the foundation model-based methods open a new window for HSI classification tasks, especially for HSI zero-shot classification.

中文翻译:


用于高光谱图像分类的基于基础模型的光谱空间变换器



最近,深度学习模型主导了高光谱图像(HSI)分类。如今,随着基于 Transformer 的基础模型的兴起,深度学习正在经历范式转变。在本研究中,研究了基于 Transformer 的基础模型(包括视觉基础模型 (VFM) 和语言基础模型 (LFM))在 HSI 分类方面的潜力。首先,为了提高传统 HSI 分类任务的性能,提出了一种基于频谱空间 VFM 的变换器(SS-VFMT),它将频谱空间信息插入到预训练的基础变换器中。具体来说,给定的预训练 Transformer 接收 HSI 补丁令牌,以受益于预先学习的权重进行远程特征提取。同时,两个增强模块,即空间和频谱增强模块(SpaEMs $\backslash $ SpeEMs),利用频谱和空间信息来控制变压器的行为。此外,还为 SS-VFMT 设计了一种额外的补丁关系蒸馏策略,以更好地利用预训练的知识,从而产生了所提出的 SS-VFMT-D。其次,基于SS-VFMT,为了解决新的HSI分类任务,即广义零样本分类,提出了一种基于光谱空间视觉语言的变换器(SS-VLFMT)。这个任务是识别训练中没有见过的新类,这更有意义,因为现实世界通常是开放的。 SS-VLFMT 利用 SS-VFMT 提取光谱空间特征和相应的哈希码,同时集成预训练的语言模型以从类名称中提取文本特征。 HSI 数据集上的实验结果表明,所提出的方法与最先进的方法相比具有竞争力。 此外,基于基础模型的方法为 HSI 分类任务,特别是 HSI 零样本分类打开了一个新窗口。
更新日期:2024-09-11
down
wechat
bug