Detecting floating litter in freshwater bodies with semi-supervised deep learning,Water Research

当前位置： X-MOL 学术 › Water Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting floating litter in freshwater bodies with semi-supervised deep learning
Water Research ( IF 11.4 ) Pub Date : 2024-09-11 , DOI: 10.1016/j.watres.2024.122405
Tianlong Jia ₁ , Rinze de Vries ₂ , Zoran Kapelan ₁ , Tim H M van Emmerik ₃ , Riccardo Taormina ₁

Affiliation

Researchers and practitioners have extensively utilized supervised Deep Learning methods to quantify floating litter in rivers and canals. These methods require the availability of large amount of labeled data for training. The labeling work is expensive and laborious, resulting in small open datasets available in the field compared to the comprehensive datasets for computer vision, e.g., ImageNet. Fine-tuning models pre-trained on these larger datasets helps improve litter detection performances and reduces data requirements. Yet, the effectiveness of using features learned from generic datasets is limited in large-scale monitoring, where automated detection must adapt across different locations, environmental conditions, and sensor settings. To address this issue, we propose a two-stage semi-supervised learning method to detect floating litter based on the Swapping Assignments between multiple Views of the same image (SwAV). SwAV is a self-supervised learning approach that learns the underlying feature representation from unlabeled data. In the first stage, we used SwAV to pre-train a ResNet50 backbone architecture on about 100k unlabeled images. In the second stage, we added new layers to the pre-trained ResNet50 to create a Faster R-CNN architecture, and fine-tuned it with a limited number of labeled images (≈1.8k images with 2.6k annotated litter items). We developed and validated our semi-supervised floating litter detection methodology for images collected in canals and waterways of Delft (the Netherlands) and Jakarta (Indonesia). We tested for out-of-domain generalization performances in a zero-shot fashion using additional data from Ho Chi Minh City (Vietnam), Amsterdam and Groningen (the Netherlands). We benchmarked our results against the same Faster R-CNN architecture trained via supervised learning alone by fine-tuning ImageNet pre-trained weights. The findings indicate that the semi-supervised learning method matches or surpasses the supervised learning benchmark when tested on new images from the same training locations. We measured better performances when little data (≈200 images with about 300 annotated litter items) is available for fine-tuning and with respect to reducing false positive predictions. More importantly, the proposed approach demonstrates clear superiority for generalization on the unseen locations, with improvements in average precision of up to 12.7%. We attribute this superior performance to the more effective high-level feature extraction from SwAV pre-training from relevant unlabeled images. Our findings highlight a promising direction to leverage semi-supervised learning for developing foundational models, which have revolutionized artificial intelligence applications in most fields. By scaling our proposed approach with more data and compute, we can make significant strides in monitoring to address the global challenge of litter pollution in water bodies.

中文翻译：

使用半监督深度学习检测淡水体中的漂浮垃圾

研究人员和从业者广泛使用有监督的深度学习方法来量化河流和运河中的漂浮垃圾。这些方法需要大量标记数据用于训练。标记工作既昂贵又费力，与计算机视觉的综合数据集（例如 ImageNet）相比，现场可用的开放数据集较小。在这些较大的数据集上预先训练的微调模型有助于提高垃圾检测性能并降低数据要求。然而，在大规模监控中使用从通用数据集中学习的特征的有效性是有限的，因为自动检测必须适应不同的位置、环境条件和传感器设置。为了解决这个问题，我们提出了一种两阶段的半监督学习方法，基于同一图像的多个视图（SwAV）之间的交换分配来检测漂浮的垃圾。SwAV 是一种自我监督的学习方法，它从未标记的数据中学习底层特征表示。在第一阶段，我们使用 SwAV 在大约 100k 张未标记的图像上预训练 ResNet50 主干架构。在第二阶段，我们在预训练的 ResNet50 中添加了新层，以创建更快的 R-CNN 架构，并使用有限数量的标记图像（≈1.8k 图像和 2.6k 注释的垃圾项目）对其进行微调。我们开发并验证了我们的半监督漂浮垃圾检测方法，用于在代尔夫特（荷兰）和雅加达（印度尼西亚）的运河和水道中收集的图像。我们使用来自胡志明市（越南）、阿姆斯特丹和格罗宁根（荷兰）的额外数据，以零镜头的方式测试了域外泛化性能。我们通过微调 ImageNet 预训练权重，将我们的结果与仅通过监督学习训练的相同 Faster R-CNN 架构进行了基准测试。研究结果表明，在来自相同训练位置的新图像上进行测试时，半监督学习方法与监督学习基准相当或超过监督学习基准。当很少有数据（≈200 张图像和大约 300 个带注释的垃圾项目）可用于微调和减少假阳性预测时，我们测量了更好的性能。更重要的是，所提出的方法在看不见的位置上表现出明显的泛化优势，平均精度提高了 12.7%。我们将这种卓越的性能归因于从相关未标记图像的 SwAV 预训练中更有效地提取高级特征。我们的研究结果突出了一个有前途的方向，即利用半监督学习来开发基础模型，这已经彻底改变了大多数领域的人工智能应用。通过使用更多的数据和计算来扩展我们提议的方法，我们可以在监测方面取得重大进展，以应对水体垃圾污染的全球挑战。

更新日期：2024-09-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南