当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-10-21 , DOI: 10.1007/s11263-024-02252-y
Hanbo Bi, Yingchao Feng, Yongqiang Mao, Jianning Pei, Wenhui Diao, Hongqi Wang, Xian Sun

Few-shot Segmentation aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images). Previous schemes would leverage the similarity between support-query pixel pairs to construct the pixel-level semantic correlation. However, in remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches, resulting in semantic ambiguity between the query foreground (FG) and background (BG) pixels. To tackle this problem, we propose a novel Agent Mining Transformer, which adaptively mines a set of local-aware agents to construct agent-level semantic correlation. Compared with pixel-level semantics, the given agents are equipped with local-contextual information and possess a broader receptive field. At this point, different query pixels can selectively aggregate the fine-grained local semantics of different agents, thereby enhancing the semantic clarity between query FG and BG pixels. Concretely, the Agent Learning Encoder is first proposed to erect the optimal transport plan that arranges different agents to aggregate support semantics under different local regions. Then, for further optimizing the agents, the Agent Aggregation Decoder and the Semantic Alignment Decoder are constructed to break through the limited support set for mining valuable class-specific semantics from unlabeled data sources and the query image itself, respectively. Extensive experiments on the remote sensing benchmark iSAID indicate that the proposed method achieves state-of-the-art performance. Surprisingly, our method remains quite competitive when extended to more common natural scenarios, i.e., PASCAL-\(5^i\) and COCO-\(20^{i}\).



中文翻译:


AgMTR:用于遥感中小样本分割的代理采矿变压器



Few-shot Segmentation 旨在仅使用少量标记样本(即支持图像)来分割查询图像中感兴趣的对象。以前的方案将利用支持-查询像素对之间的相似性来构建像素级语义相关性。然而,在类内变化极端、背景杂乱的遥感场景下,这种像素级的相关性可能会产生巨大的不匹配,导致查询前景 (FG) 像素和背景像素 (BG) 之间存在语义歧义。为了解决这个问题,我们提出了一种新的 Agent Mining Transformer,它自适应地挖掘一组本地感知的代理来构建代理级语义关联。与像素级语义相比,给定的代理配备了局部上下文信息,并拥有更广泛的感受野。此时,不同的查询像素可以选择性地聚合不同代理体的细粒度局部语义,从而增强查询 FG 和 BG 像素之间的语义清晰度。具体来说,代理学习编码器首先被提出来建立最优传输计划,将不同的代理安排起来,以聚合不同局部区域下的支持语义。然后,为了进一步优化代理,构建了 Agent Aggregation Decoder 和 Semantic Alignment Decoder,以突破对分别从未标记数据源和查询图像本身挖掘有价值的类特定语义的有限支持集。对遥感基准 iSAID 的大量实验表明,所提出的方法实现了最先进的性能。令人惊讶的是,当扩展到更常见的自然场景时,即 PASCAL-\(5^i\) 和 COCO-\(20^{i}\) 时,我们的方法仍然具有相当的竞争力。

更新日期:2024-10-22
down
wechat
bug