当前位置: X-MOL 学术IEEE Trans. Affect. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transformer-Augmented Network With Online Label Correction for Facial Expression Recognition
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2023-06-13 , DOI: 10.1109/taffc.2023.3285231
Fuyan Ma 1 , Bin Sun 1 , Shutao Li 2
Affiliation  

Facial expression recognition (FER) in the wild is extremely challenging due to occlusions, variant head poses under unconstrained conditions and incorrect annotations (e.g., label noise). In this article, we aim to improve the performance of in-the-wild FER with Transformers and online label correction. Different from pure CNNs based methods, we propose a Transformer-augmented network (TAN) to dynamically capture the relationships within each facial patch and across the facial patches. Specifically, the TAN translates a number of facial patch images into a set of visual feature sequences by a backbone convolutional neural network. The intra-patch Transformer is subsequently utilized to capture the most discriminative features within each visual feature sequence. The position-disentangled attention mechanism of the intra-patch Transformer is proposed to better incorporate the positional information for feature sequences. Furthermore, we propose the inter-patch Transformer to model the dependencies across these feature sequences. More importantly, we present the online label correction (OLC) framework to correct suspicious hard labels and accumulate soft labels based on the predictions of the model, which strengthens the robustness of our model against label noise. We validate our method on several widely-used datasets (RAF-DB, FERPlus, AffectNet), realistic occlusion and pose variation datasets, and synthetic noisy datasets. Extensive experiments on these benchmarks demonstrate that the proposed method performs favorably against state-of-the-art methods. The source code will be made publicly available.

中文翻译:


用于面部表情识别的带有在线标签校正的 Transformer 增强网络



由于遮挡、无约束条件下的头部姿势变化以及不正确的注释(例如标签噪声),野外面部表情识别(FER)极具挑战性。在本文中,我们的目标是通过 Transformer 和在线标签校正来提高野外 FER 的性能。与纯粹基于 CNN 的方法不同,我们提出了一种 Transformer 增强网络(TAN)来动态捕获每个面部斑块内以及面部斑块之间的关系。具体来说,TAN 通过主干卷积神经网络将多个面部斑块图像转换为一组视觉特征序列。随后使用块内变换器来捕获每个视觉特征序列中最具辨别力的特征。提出了块内 Transformer 的位置解缠注意力机制,以更好地融合特征序列的位置信息。此外,我们提出了补丁间 Transformer 来对这些特征序列之间的依赖关系进行建模。更重要的是,我们提出了在线标签校正(OLC)框架来纠正可疑的硬标签并根据模型的预测积累软标签,这增强了我们的模型针对标签噪声的鲁棒性。我们在几个广泛使用的数据集(RAF-DB、FERPlus、AffectNet)、真实遮挡和姿势变化数据集以及合成噪声数据集上验证了我们的方法。对这些基准的大量实验表明,所提出的方法比最先进的方法表现得更好。源代码将公开。
更新日期:2023-06-13
down
wechat
bug