MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-10-08 , DOI: 10.1007/s11263-024-02223-3
Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to employ an off-the-shelf text-to-image diffusion model as a useful dataset generator for object instances and mask annotations. First, we divide an image canvas into several regions and perform a single round of diffusion process to generate multiple instances simultaneously, conditioning on different text prompts. Second, we obtain corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. Without bells and whistles, our MosaicFusion can produce a significant amount of synthetic labeled data for both rare and novel categories. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models, especially for rare and novel categories. Code: https://github.com/Jiahao000/MosaicFusion.

中文翻译：

MosaicFusion：扩散模型作为大词汇量实例分割的数据增强器

我们提出了 MosaicFusion，一种简单而有效的基于扩散的数据增强方法，用于大词汇量实例分割。我们的方法无需培训，也不依赖任何标签监督。两个关键设计使我们能够采用现成的文本到图像扩散模型作为对象实例和掩模注释的有用数据集生成器。首先，我们将图像画布划分为多个区域，并执行单轮扩散过程以同时生成多个实例，并根据不同的文本提示进行调节。其次，我们通过跨层和扩散时间步骤聚合与对象提示相关的交叉注意力图来获得相应的实例掩码，然后进行简单的阈值处理和边缘感知细化处理。没有花里胡哨的东西，我们的 MosaicFusion 可以为稀有类别和新颖类别生成大量合成标记数据。在具有挑战性的 LVIS 长尾和开放词汇基准上的实验结果表明，MosaicFusion 可以显着提高现有实例分割模型的性能，特别是对于稀有和新颖的类别。代码：https://github.com/Jiahao000/MosaicFusion。

更新日期：2024-10-08

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南