当前位置:
X-MOL 学术
›
IEEE Trans. Image Process.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Injecting Text Clues for Improving Anomalous Event Detection From Weakly Labeled Videos
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2024-10-15 , DOI: 10.1109/tip.2024.3477351 Tianshan Liu, Kin-Man Lam, Bing-Kun Bao
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2024-10-15 , DOI: 10.1109/tip.2024.3477351 Tianshan Liu, Kin-Man Lam, Bing-Kun Bao
Video anomaly detection (VAD) aims at localizing the snippets containing anomalous events in long unconstrained videos. The weakly supervised (WS) setting, where solely video-level labels are available during training, has attracted considerable attention, owing to its satisfactory trade-off between the detection performance and annotation cost. However, due to lack of snippet-level dense labels, the existing WS-VAD methods still get easily stuck on the detection errors, caused by false alarms and incomplete localization. To address this dilemma, in this paper, we propose to inject text clues of anomaly-event categories for improving WS-VAD, via a dedicated dual-branch framework. For suppressing the response of confusing normal contexts, we first present a text-guided anomaly discovering (TAG) branch based on a hierarchical matching scheme, which utilizes the label-text queries to search the discriminative anomalous snippets in a global-to-local fashion. To facilitate the completeness of anomaly-instance localization, an anomaly-conditioned text completion (ATC) branch is further designed to perform an auxiliary generative task, which intrinsically forces the model to gather sufficient event semantics from all the relevant anomalous snippets for completely reconstructing the masked description sentence. Furthermore, to encourage the cross-branch knowledge sharing, a mutual learning strategy is introduced by imposing a consistency constraint on the anomaly scores of these two branches. Extensive experimental results on two public benchmarks validate that the proposed method achieves superior performance over the competing methods.
中文翻译:
注入文本线索以改进弱标记视频的异常事件检测
视频异常检测 (VAD) 旨在定位长而不受约束的视频中包含异常事件的片段。弱监督 (WS) 设置,即在训练期间只有视频级标签可用,由于其在检测性能和注释成本之间的令人满意的权衡而引起了相当大的关注。但是,由于缺乏片段级的密集标签,现有的 WS-VAD 方法仍然容易卡在检测错误上,导致误报和定位不完整。为了解决这一困境,在本文中,我们建议通过专用的双分支框架注入异常事件类别的文本线索,以改进 WS-VAD。为了抑制令人困惑的正常上下文的响应,我们首先提出了一个基于分层匹配方案的文本引导异常发现 (TAG) 分支,该分支利用标签文本查询以全局到局部的方式搜索判别性异常片段。为了促进异常实例定位的完整性,进一步设计了异常条件文本完成 (ATC) 分支来执行辅助生成任务,该任务本质上迫使模型从所有相关的异常片段中收集足够的事件语义,以完全重建掩蔽的描述句子。此外,为了鼓励跨分支知识共享,通过对这两个分支的异常分数施加一致性约束,引入了一种相互学习策略。在两个公开基准上的大量实验结果验证了所提出的方法比竞争方法具有更好的性能。
更新日期:2024-10-15
中文翻译:
注入文本线索以改进弱标记视频的异常事件检测
视频异常检测 (VAD) 旨在定位长而不受约束的视频中包含异常事件的片段。弱监督 (WS) 设置,即在训练期间只有视频级标签可用,由于其在检测性能和注释成本之间的令人满意的权衡而引起了相当大的关注。但是,由于缺乏片段级的密集标签,现有的 WS-VAD 方法仍然容易卡在检测错误上,导致误报和定位不完整。为了解决这一困境,在本文中,我们建议通过专用的双分支框架注入异常事件类别的文本线索,以改进 WS-VAD。为了抑制令人困惑的正常上下文的响应,我们首先提出了一个基于分层匹配方案的文本引导异常发现 (TAG) 分支,该分支利用标签文本查询以全局到局部的方式搜索判别性异常片段。为了促进异常实例定位的完整性,进一步设计了异常条件文本完成 (ATC) 分支来执行辅助生成任务,该任务本质上迫使模型从所有相关的异常片段中收集足够的事件语义,以完全重建掩蔽的描述句子。此外,为了鼓励跨分支知识共享,通过对这两个分支的异常分数施加一致性约束,引入了一种相互学习策略。在两个公开基准上的大量实验结果验证了所提出的方法比竞争方法具有更好的性能。