Foundation Model-Based Spectral–Spatial Transformer for Hyperspectral Image Classification,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Foundation Model-Based Spectral–Spatial Transformer for Hyperspectral Image Classification
IEEE Transactions on Geoscience and Remote Sensing ( IF 7.5 ) Pub Date : 2024-09-11 , DOI: 10.1109/tgrs.2024.3456129
Lingbo Huang ₁ , Yushi Chen ₁ , Xin He ₁

Affiliation

Recently, deep learning models have dominated hyperspectral image (HSI) classification. Nowadays, deep learning is undergoing a paradigm shift with the rise of transformer-based foundation models. In this study, the potential of transformer-based foundation models, including the vision foundation model (VFM) and language foundation model (LFM), for HSI classification are investigated. First, to improve the performance of traditional HSI classification tasks, a spectral-spatial VFM-based transformer (SS-VFMT) is proposed, which inserts spectral-spatial information into the pretrained foundation transformer. Specifically, a given pretrained transformer receives HSI patch tokens for long-range feature extraction benefiting from the prelearned weights. Meanwhile, two enhancement modules, i.e., spatial and spectral enhancement modules (SpaEMs

$\backslash $

SpeEMs), utilize spectral and spatial information for steering the behavior of the transformer. Besides, an additional patch relationship distillation strategy is designed for SS-VFMT to exploit the pretrained knowledge better, leading to the proposed SS-VFMT-D. Second, based on SS-VFMT, to address a new HSI classification task, i.e., generalized zero-shot classification, a spectral-spatial vision-language-based transformer (SS-VLFMT) is proposed. This task is to recognize novel classes not seen during training, which is more meaningful as the real world is usually open. The SS-VLFMT leverages SS-VFMT to extract spectral-spatial features and corresponding hash codes while integrating a pretrained language model to extract text features from class names. Experimental results on HSI datasets reveal that the proposed methods are competitive compared to the state-of-the-art methods. Moreover, the foundation model-based methods open a new window for HSI classification tasks, especially for HSI zero-shot classification.

中文翻译：

用于高光谱图像分类的基于基础模型的光谱空间变换器

最近，深度学习模型主导了高光谱图像（HSI）分类。如今，随着基于 Transformer 的基础模型的兴起，深度学习正在经历范式转变。在本研究中，研究了基于 Transformer 的基础模型（包括视觉基础模型 (VFM) 和语言基础模型 (LFM)）在 HSI 分类方面的潜力。首先，为了提高传统 HSI 分类任务的性能，提出了一种基于频谱空间 VFM 的变换器（SS-VFMT），它将频谱空间信息插入到预训练的基础变换器中。具体来说，给定的预训练 Transformer 接收 HSI 补丁令牌，以受益于预先学习的权重进行远程特征提取。同时，两个增强模块，即空间和频谱增强模块（SpaEMs $\backslash $ SpeEMs），利用频谱和空间信息来控制变压器的行为。此外，还为 SS-VFMT 设计了一种额外的补丁关系蒸馏策略，以更好地利用预训练的知识，从而产生了所提出的 SS-VFMT-D。其次，基于SS-VFMT，为了解决新的HSI分类任务，即广义零样本分类，提出了一种基于光谱空间视觉语言的变换器（SS-VLFMT）。这个任务是识别训练中没有见过的新类，这更有意义，因为现实世界通常是开放的。 SS-VLFMT 利用 SS-VFMT 提取光谱空间特征和相应的哈希码，同时集成预训练的语言模型以从类名称中提取文本特征。 HSI 数据集上的实验结果表明，所提出的方法与最先进的方法相比具有竞争力。此外，基于基础模型的方法为 HSI 分类任务，特别是 HSI 零样本分类打开了一个新窗口。

更新日期：2024-09-11

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南