International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-11-04 , DOI: 10.1007/s11263-024-02275-5 Huimin Ma, Sheng Yi, Shijie Chen, Jiansheng Chen, Yu Wang
Previous weakly supervised semantic segmentation (WSSS) methods mainly begin with the segmentation seeds from the CAM method. Because of the high complexity of driving scene images, their framework performs not well on driving scene datasets. In this paper, we propose a new kind of WSSS annotations on the complex driving scene dataset, with only one or several labeled points per category. This annotation is more lightweight than image-level annotation and provides critical localization information for prototypes. We propose a framework to address the WSSS task under this annotation, which generates prototype feature vectors from labeled points and then produces 2D pseudo labels. Besides, we found the point cloud data is useful for distinguishing different objects. Our framework could extract rich semantic information from unlabeled point cloud data and generate instance masks, which does not require extra annotation resources. We combine the pseudo labels and the instance masks to modify the incorrect regions and thus obtain more accurate supervision for training the semantic segmentation network. We evaluated this framework on the KITTI dataset. Experiments show that the proposed method achieves state-of-the-art performance.
中文翻译:
基于少注释像素和点云的弱监督驾驶场景语义分割
以前的弱监督语义分割 (WSSS) 方法主要从 CAM 方法的分割种子开始。由于驾驶场景图像的高度复杂性,他们的框架在驾驶场景数据集上表现不佳。在本文中,我们在复杂的驾驶场景数据集上提出了一种新型的 WSSS 注释,每个类别只有一个或几个标记点。此注释比图像级注释更轻量级,并为原型提供关键的本地化信息。我们提出了一个框架来解决此注释下的 WSSS 任务,该框架从标记点生成原型特征向量,然后生成 2D 伪标签。此外,我们发现点云数据对于区分不同的对象很有用。我们的框架可以从未标记的点云数据中提取丰富的语义信息并生成实例掩码,这不需要额外的注释资源。我们将伪标签和实例掩码结合起来,修改不正确的区域,从而获得更准确的监督来训练语义分割网络。我们在 KITTI 数据集上评估了这个框架。实验表明,所提出的方法实现了最先进的性能。