International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-10-08 , DOI: 10.1007/s11263-024-02223-3 Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to employ an off-the-shelf text-to-image diffusion model as a useful dataset generator for object instances and mask annotations. First, we divide an image canvas into several regions and perform a single round of diffusion process to generate multiple instances simultaneously, conditioning on different text prompts. Second, we obtain corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. Without bells and whistles, our MosaicFusion can produce a significant amount of synthetic labeled data for both rare and novel categories. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models, especially for rare and novel categories. Code: https://github.com/Jiahao000/MosaicFusion.
中文翻译:
MosaicFusion:扩散模型作为大词汇量实例分割的数据增强器
我们提出了 MosaicFusion,一种简单而有效的基于扩散的数据增强方法,用于大词汇量实例分割。我们的方法无需培训,也不依赖任何标签监督。两个关键设计使我们能够采用现成的文本到图像扩散模型作为对象实例和掩模注释的有用数据集生成器。首先,我们将图像画布划分为多个区域,并执行单轮扩散过程以同时生成多个实例,并根据不同的文本提示进行调节。其次,我们通过跨层和扩散时间步骤聚合与对象提示相关的交叉注意力图来获得相应的实例掩码,然后进行简单的阈值处理和边缘感知细化处理。没有花里胡哨的东西,我们的 MosaicFusion 可以为稀有类别和新颖类别生成大量合成标记数据。在具有挑战性的 LVIS 长尾和开放词汇基准上的实验结果表明,MosaicFusion 可以显着提高现有实例分割模型的性能,特别是对于稀有和新颖的类别。代码:https://github.com/Jiahao000/MosaicFusion。