Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 7-1-2024 , DOI: 10.1109/tpami.2024.3421340
Yue Han ₁ , Jiangning Zhang ₂ , Yabiao Wang ₂ , Chengjie Wang ₂ , Yong Liu ₁ , Lu Qi ₃ , Xiangtai Li ₄ , Ming-Hsuan Yang ₃

Affiliation

Few-Shot Instance Segmentation (FSIS) requires detecting and segmenting novel classes with limited support examples. Existing methods based on Region Proposal Networks (RPNs) face two issues: 1) Overfitting suppresses novel class objects; 2) Dual-branch models require complex spatial correlation strategies to prevent spatial information loss when generating class prototypes. We introduce a unified framework, Reference Twice (RefT), to exploit the relationship between support and query features for FSIS and related tasks. Our three main contributions are: 1) A novel transformer-based baseline that avoids overfitting, offering a new direction for FSIS; 2) Demonstrating that support object queries encode key factors after base training, allowing query features to be enhanced twice at both feature and query levels using simple cross-attention, thus avoiding complex spatial correlation interaction; 3) Introducing a class-enhanced base knowledge distillation loss to address the issue of DETR-like models struggling with incremental settings due to the input projection layer, enabling easy extension to incremental FSIS. Extensive experimental evaluations on the COCO dataset under three FSIS settings demonstrate that our method performs favorably against existing approaches across different shots, e.g., +8.2/ + 9.4 performance gain over state-of-the-art methods with 10/30-shots. Source code and models will be available at this github site

中文翻译：

参考两次：用于少样本实例分割的简单且统一的基线

少样本实例分割 (FSIS) 需要检测和分割具有有限支持示例的新类。现有的基于区域提议网络（RPN）的方法面临两个问题：1）过拟合抑制了新的类对象； 2）双分支模型需要复杂的空间关联策略，以防止生成类原型时空间信息丢失。我们引入了一个统一的框架，Reference Twice (RefT)，以利用 FSIS 和相关任务的支持和查询功能之间的关系。我们的三个主要贡献是： 1) 一种新颖的基于变压器的基线，可以避免过度拟合，为 FSIS 提供新的方向； 2）证明支持对象查询在基础训练后对关键因素进行编码，允许使用简单的交叉注意力在特征和查询级别上对查询特征进行两次增强，从而避免复杂的空间相关交互； 3) 引入类增强的基础知识蒸馏损失，以解决类 DETR 模型由于输入投影层而难以应对增量设置的问题，从而可以轻松扩展到增量 FSIS。在三种 FSIS 设置下对 COCO 数据集进行的广泛实验评估表明，我们的方法在不同镜头中的性能优于现有方法，例如，与 10/30 镜头的最先进方法相比，性能增益 +8.2/+9.4。源代码和模型将在此 github 站点上提供

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>