International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-10-17 , DOI: 10.1007/s11263-024-02258-6 Zhiwen Shao, Hancheng Zhu, Yong Zhou, Xiang Xiang, Bing Liu, Rui Yao, Lizhuang Ma
Facial action unit (AU) detection remains a challenging task, due to the subtlety, dynamics, and diversity of AUs. Recently, the prevailing techniques of self-attention and causal inference have been introduced to AU detection. However, most existing methods directly learn self-attention guided by AU detection, or employ common patterns for all AUs during causal intervention. The former often captures irrelevant information in a global range, and the latter ignores the specific causal characteristic of each AU. In this paper, we propose a novel AU detection framework called \(\textrm{AC}^{2}\)D by adaptively constraining self-attention weight distribution and causally deconfounding the sample confounder. Specifically, we explore the mechanism of self-attention weight distribution, in which the self-attention weight distribution of each AU is regarded as spatial distribution and is adaptively learned under the constraint of location-predefined attention and the guidance of AU detection. Moreover, we propose a causal intervention module for each AU, in which the bias caused by training samples and the interference from irrelevant AUs are both suppressed. Extensive experiments show that our method achieves competitive performance compared to state-of-the-art AU detection approaches on challenging benchmarks, including BP4D, DISFA, GFT, and BP4D+ in constrained scenarios and Aff-Wild2 in unconstrained scenarios.
中文翻译:
通过适应性约束自我注意和因果解混样本进行面部动作单元检测
由于 AU 的微妙性、动态性和多样性,面部动作单元 (AU) 检测仍然是一项具有挑战性的任务。最近,流行的自我注意和因果推理技术已被引入 AU 检测。然而,大多数现有方法直接学习由 AU 检测指导的自我注意,或在因果干预期间对所有 AU 采用通用模式。前者经常捕获全局范围内的不相关信息,而后者则忽略了每个 AU 的特定因果特征。在本文中,我们提出了一种新的 AU 检测框架,称为 \(\textrm{AC}^{2}\)D,通过自适应地约束自我注意力权重分布并因果地解混样本混杂因素。具体来说,我们探讨了自我注意权重分布的机制,其中每个AU的自我注意权重分布被视为空间分布,并在位置预定义注意力的约束下和AU检测的指导下自适应学习。此外,我们为每个 AU 提出了一个因果干预模块,其中训练样本引起的偏差和不相关 AU 的干扰都被抑制了。广泛的实验表明,与最先进的 AU 检测方法相比,我们的方法在具有挑战性的基准上取得了有竞争力的性能,包括约束场景中的 BP4D、DISFA、GFT 和 BP4D+,以及无约束场景中的 Aff-Wild2。