当前位置:
X-MOL 学术
›
ISPRS J. Photogramm. Remote Sens.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2024-10-02 , DOI: 10.1016/j.isprsjprs.2024.09.025 Pan Zhang, Baochai Peng, Chaoran Lu, Quanjin Huang, Dongsheng Liu
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2024-10-02 , DOI: 10.1016/j.isprsjprs.2024.09.025 Pan Zhang, Baochai Peng, Chaoran Lu, Quanjin Huang, Dongsheng Liu
Synthetic Aperture Radar (SAR) images have proven to be a valuable cue for multimodal Land Cover Classification (LCC) when combined with RGB images. Most existing studies on cross-modal fusion assume that consistent feature information is necessary between the two modalities, and as a result, they construct networks without adequately addressing the unique characteristics of each modality. In this paper, we propose a novel architecture, named the Asymmetric Semantic Aligning Network (ASANet), which introduces asymmetry at the feature level to address the issue that multi-modal architectures frequently fail to fully utilize complementary features. The core of this network is the Semantic Focusing Module (SFM), which explicitly calculates differential weights for each modality to account for the modality-specific features. Furthermore, ASANet incorporates a Cascade Fusion Module (CFM), which delves deeper into channel and spatial representations to efficiently select features from the two modalities for fusion. Through the collaborative effort of these two modules, the proposed ASANet effectively learns feature correlations between the two modalities and eliminates noise caused by feature differences. Comprehensive experiments demonstrate that ASANet achieves excellent performance on three multimodal datasets. Additionally, we have established a new RGB-SAR multimodal dataset, on which our ASANet outperforms other mainstream methods with improvements ranging from 1.21% to 17.69%. The ASANet runs at 48.7 frames per second (FPS) when the input image is 256 × 256 pixels.
中文翻译:
ASANet:用于 RGB 和 SAR 图像土地覆盖分类的非对称语义对齐网络
事实证明,合成孔径雷达 (SAR) 图像与 RGB 图像相结合时,是多模态土地覆盖分类 (LCC) 的宝贵线索。大多数现有的跨模态融合研究都假设两种模态之间需要一致的特征信息,因此,它们构建网络而没有充分解决每种模态的独特特征。在本文中,我们提出了一种名为非对称语义对齐网络 (ASANet) 的新型架构,它在特征级别引入了不对称性,以解决多模态架构经常无法充分利用互补特征的问题。该网络的核心是语义聚焦模块 (SFM),它显式计算每种模态的差分权重,以说明模态特定的特征。此外,ASANet 还集成了级联融合模块 (CFM),该模块更深入地研究通道和空间表示,以有效地从两种模式中选择特征进行融合。通过这两个模块的协同工作,所提出的 ASANet 有效地学习了两种模态之间的特征相关性,并消除了由特征差异引起的噪声。综合实验表明,ASANet 在 3 个多模态数据集上取得了优异的性能。此外,我们还建立了一个新的 RGB-SAR 多模态数据集,在该数据集上,我们的 ASANet 优于其他主流方法,改进范围从 1.21% 到 17.69%。当输入图像为 256 × 256 像素时,ASANet 以 48.7 帧/秒 (FPS) 的速度运行。
更新日期:2024-10-02
中文翻译:
ASANet:用于 RGB 和 SAR 图像土地覆盖分类的非对称语义对齐网络
事实证明,合成孔径雷达 (SAR) 图像与 RGB 图像相结合时,是多模态土地覆盖分类 (LCC) 的宝贵线索。大多数现有的跨模态融合研究都假设两种模态之间需要一致的特征信息,因此,它们构建网络而没有充分解决每种模态的独特特征。在本文中,我们提出了一种名为非对称语义对齐网络 (ASANet) 的新型架构,它在特征级别引入了不对称性,以解决多模态架构经常无法充分利用互补特征的问题。该网络的核心是语义聚焦模块 (SFM),它显式计算每种模态的差分权重,以说明模态特定的特征。此外,ASANet 还集成了级联融合模块 (CFM),该模块更深入地研究通道和空间表示,以有效地从两种模式中选择特征进行融合。通过这两个模块的协同工作,所提出的 ASANet 有效地学习了两种模态之间的特征相关性,并消除了由特征差异引起的噪声。综合实验表明,ASANet 在 3 个多模态数据集上取得了优异的性能。此外,我们还建立了一个新的 RGB-SAR 多模态数据集,在该数据集上,我们的 ASANet 优于其他主流方法,改进范围从 1.21% 到 17.69%。当输入图像为 256 × 256 像素时,ASANet 以 48.7 帧/秒 (FPS) 的速度运行。