当前位置:
X-MOL 学术
›
Future Gener. Comput. Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Click-level supervision for online action detection extended from SCOAD
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-12-19 , DOI: 10.1016/j.future.2024.107668 Xing Zhang, Yuhan Mei, Ye Na, Xia Ling Lin, Genqing Bian, Qingsen Yan, Ghulam Mohi-ud-din, Chen Ai, Zhou Li, Wei Dong
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-12-19 , DOI: 10.1016/j.future.2024.107668 Xing Zhang, Yuhan Mei, Ye Na, Xia Ling Lin, Genqing Bian, Qingsen Yan, Ghulam Mohi-ud-din, Chen Ai, Zhou Li, Wei Dong
Data-driven fully-supervised online action detection algorithms heavily rely on manual annotations, which are challenging to obtain in real-world applications. Current research efforts aim to address this issue by introducing weakly supervised online action detection (WOAD) methods that utilize video-level annotations. However, these approaches frequently face challenges with blurred temporal boundaries, stemming from the lack of explicit temporal information. In this work, we revisit WOAD and propose an algorithm for weakly supervised online action detection using click-level annotations, which we call Single-frame Click Supervision for Online Action Detection (SCOAD). SCOAD stands out by significantly improving prediction accuracy without substantially increasing the annotation cost. This improvement is achieved through a set of well-engineered loss functions that leverage the limited temporal information provided by click labels. Additionally, we present an enhanced version of SCOAD called SCOAD++. It introduces a novel mechanism that enhances the model’s ability to utilize historical information and significantly refines detail differentiation, addressing the limitations of traditional fully connected frameworks that neglect temporal variations. Furthermore, to explore the issue of accuracy variation caused by inherent randomness in click-level annotation, we have constructed a human fitness video dataset for this study. On the other hand, we also reveal the limitations of video-level labels in the field of action detection with this well-constructed dataset. We perform extensive experiments on numerous benchmark datasets and demonstrate that our approach outperforms state-of-the-art methods.
中文翻译:
从 SCOAD 扩展而来的在线操作检测的点击级监督
数据驱动的完全监督在线动作检测算法严重依赖手动注释,这在实际应用中很难获得。目前的研究工作旨在通过引入利用视频级注释的弱监督在线动作检测 (WOAD) 方法来解决这个问题。然而,由于缺乏明确的时间信息,这些方法经常面临模糊时间边界的挑战。在这项工作中,我们重新审视了 WOAD,并提出了一种使用点击级注释进行弱监督在线动作检测的算法,我们称之为在线动作检测的单帧点击监督 (SCOAD)。SCOAD 通过在不大幅增加注释成本的情况下显著提高预测准确性而脱颖而出。这种改进是通过一组精心设计的损失函数实现的,这些函数利用了点击标签提供的有限时间信息。此外,我们还推出了一个名为 SCOAD++ 的 SCOAD 增强版本。它引入了一种新的机制,增强了模型利用历史信息的能力,并显著优化了细节区分,解决了传统全连接框架忽视时间变化的局限性。此外,为了探索点击级注释中固有随机性引起的准确性变化问题,我们为本研究构建了一个人体健身视频数据集。另一方面,我们还通过这个结构良好的数据集揭示了视频级标签在动作检测领域的局限性。我们在众多基准数据集上进行了广泛的实验,并证明我们的方法优于最先进的方法。
更新日期:2024-12-19
中文翻译:
从 SCOAD 扩展而来的在线操作检测的点击级监督
数据驱动的完全监督在线动作检测算法严重依赖手动注释,这在实际应用中很难获得。目前的研究工作旨在通过引入利用视频级注释的弱监督在线动作检测 (WOAD) 方法来解决这个问题。然而,由于缺乏明确的时间信息,这些方法经常面临模糊时间边界的挑战。在这项工作中,我们重新审视了 WOAD,并提出了一种使用点击级注释进行弱监督在线动作检测的算法,我们称之为在线动作检测的单帧点击监督 (SCOAD)。SCOAD 通过在不大幅增加注释成本的情况下显著提高预测准确性而脱颖而出。这种改进是通过一组精心设计的损失函数实现的,这些函数利用了点击标签提供的有限时间信息。此外,我们还推出了一个名为 SCOAD++ 的 SCOAD 增强版本。它引入了一种新的机制,增强了模型利用历史信息的能力,并显著优化了细节区分,解决了传统全连接框架忽视时间变化的局限性。此外,为了探索点击级注释中固有随机性引起的准确性变化问题,我们为本研究构建了一个人体健身视频数据集。另一方面,我们还通过这个结构良好的数据集揭示了视频级标签在动作检测领域的局限性。我们在众多基准数据集上进行了广泛的实验,并证明我们的方法优于最先进的方法。