Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval
IEEE Transactions on Geoscience and Remote Sensing ( IF 7.5 ) Pub Date : 6-21-2024 , DOI: 10.1109/tgrs.2024.3417421
Lilu Zhu ₁ , Yang Wang ₁ , Yanfeng Hu ₂ , Xiaolu Su ₁ , Kun Fu ₂

Affiliation

Optical satellites are the most popular observation platforms for humans viewing Earth. Driven by rapidly developing multisource optical remote sensing technology, content-based remote sensing image retrieval (CBRSIR), which aims to retrieve images of interest using extracted visual features, faces new challenges derived from large data volumes, complex feature information, and various spatiotemporal resolutions. Most previous works delve into optical image representation and transformation to the semantic space of retrieval via supervised or unsupervised learning. These retrieval methods fail to fully leverage geospatial information, especially spatiotemporal features, which can improve the accuracy and efficiency to some extent. In this article, we propose a cross-modal contrastive learning method (CCLS2T) to maximize the mutual information of multisource remote sensing platforms for correlation-aware retrieval. Specifically, we develop an asymmetric dual-encoder architecture with a vision encoder that operates on multiscale visual inputs, and a lightweight text encoder that reconstructs spatiotemporal embeddings and adopts an intermediate contrastive objective on representations from unimodal encoders. Then, we add a hash layer to transform the deep fusion features into compact hash index codes. In addition, CCLS2T exploits the prompt template (R2STFT) for multisource remote sensing retrieval to address the text heterogeneity of metadata files and the hierarchical semantic tree (RSHST) to address the feature sparsification of semantic-aware indexing structures. The experimental results on three optical remote sensing datasets substantiate that the proposed CCLS2T can improve retrieval performance by 11.64% and 9.91% compared with many existing hash learning methods and server-side retrieval engines, respectively, in typical optical remote sensing retrieval scenarios.

中文翻译：

具有时空背景的跨模态对比学习，用于相关感知的多尺度遥感图像检索

光学卫星是人类观察地球最受欢迎的观测平台。在多源光学遥感技术快速发展的推动下，基于内容的遥感图像检索（CBRSIR）旨在利用提取的视觉特征来检索感兴趣的图像，但面临着数据量大、特征信息复杂、时空分辨率多样的新挑战。之前的大多数工作都通过监督或无监督学习深入研究光学图像表示和检索语义空间的转换。这些检索方法未能充分利用地理空间信息，特别是时空特征，在一定程度上可以提高准确性和效率。在本文中，我们提出了一种跨模态对比学习方法（CCLS2T），以最大化多源遥感平台的互信息，以进行相关感知检索。具体来说，我们开发了一种非对称双编码器架构，其中一个视觉编码器可在多尺度视觉输入上运行，另一个轻量级文本编码器可重建时空嵌入并在单峰编码器的表示上采用中间对比目标。然后，我们添加哈希层将深度融合特征转换为紧凑的哈希索引代码。此外，CCLS2T利用多源遥感检索的提示模板（R2STFT）来解决元数据文件的文本异构性，并利用分层语义树（RSHST）来解决语义感知索引结构的特征稀疏化。在三个光学遥感数据集上的实验结果证明，所提出的CCLS2T可以将检索性能提高11.64%和9.在典型的光学遥感检索场景中，与许多现有的哈希学习方法和服务器端检索引擎相比，分别提高了 91%。

更新日期：2024-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>