当前位置: X-MOL 学术IEEE Trans. Affect. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adversarial Domain Generalized Transformer for Cross-Corpus Speech Emotion Recognition
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2023-06-29 , DOI: 10.1109/taffc.2023.3290795
Yuan Gao 1 , Longbiao Wang 1 , Jiaxing Liu 1 , Jianwu Dang 1 , Shogo Okada 2

Speech emotion recognition (SER) promotes the development of intelligent devices, which enable natural and friendly human-computer interactions. However, the recognition performance of existing approaches is significantly reduced on unseen datasets, and the lack of sufficient training data limits the generalizability of deep learning models. In this article, we analyze the impact of the domain generalization method on cross-corpus SER and propose an adversarial domain generalized transformer (ADoGT), which is aimed at learning a shared feature distribution for the source and target domains. Specifically, we investigate the effect of domain adversarial learning by eliminating nonaffective information. We also combine the center loss with the softmax function as joint supervision to learn discriminative features. Moreover, we introduce unsupervised transfer learning to extract additional features, and incorporate a gated fusion model to learn the complementary information of the features learned by the supervised feature extractor and pretrained model. The proposed transformer based domain generalization method is evaluated using four emotional datasets. We also provide an ablation study of different domain adversarial model structures and feature fusion models. The results of comparative experiments demonstrate the effectiveness of the proposed ADoGT.



语音情感识别(SER)促进了智能设备的发展,实现自然、友好的人机交互。然而,现有方法的识别性能在未见过的数据集上显着降低,并且缺乏足够的训练数据限制了深度学习模型的通用性。在本文中,我们分析了域泛化方法对跨语料库 SER 的影响,并提出了一种对抗性域泛化变换器(ADoGT),旨在学习源域和目标域的共享特征分布。具体来说,我们通过消除非情感信息来研究领域对抗性学习的效果。我们还将中心损失与 softmax 函数结合起来作为联合监督来学习判别特征。此外,我们引入无监督迁移学习来提取额外的特征,并结合门控融合模型来学习监督特征提取器和预训练模型所学到的特征的互补信息。使用四个情感数据集评估所提出的基于变压器的域泛化方法。我们还提供了不同领域对抗模型结构和特征融合模型的消融研究。对比实验的结果证明了所提出的 ADoGT 的有效性。