Analyzing Continuous-Time and Sentence-Level Annotations for Speech Emotion Recognition,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Analyzing Continuous-Time and Sentence-Level Annotations for Speech Emotion Recognition
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2024-03-01 , DOI: 10.1109/taffc.2024.3372380
Luz Martinez-Lucas ₁ , Wei-Cheng Lin ₁ , Carlos Busso ₁

Affiliation

The emotional content of several databases are annotated with continuous-time (CT) annotations, providing traces with frame-by-frame scores describing the instantaneous value of an emotional attribute. However, having a single score describing the global emotion of a short segment is more convenient for several emotion recognition formulations. A common approach is to derive sentence-level (SL) labels from CT annotations by aggregating the values of the emotional traces across time and annotators. How similar are these aggregated SL labels from labels originally collected at the sentence level? The release of the MSP-Podcast (SL annotations) and MSP-Conversation (CT annotations) corpora provides the resources to explore the validity of aggregating SL labels from CT annotations. There are 2,884 speech segments that belong to both corpora. Using this set, this study (1) compares both types of annotations using statistical metrics, (2) evaluates their inter-evaluator agreements, and (3) explores the effect of these SL labels on speech emotion recognition (SER) tasks. The analysis reveals benefits of using SL labels derived from CT annotations in the estimation of valence. This analysis also provides insights on how the two types of labels differ and how that could affect a model.

中文翻译：

分析语音情感识别的连续时间和句子级注释

多个数据库的情感内容均使用连续时间 (CT) 注释进行注释，提供描述情感属性瞬时值的逐帧分数轨迹。然而，使用单个分数来描述短片段的全局情绪对于多种情绪识别公式来说更方便。一种常见的方法是通过聚合跨时间和注释者的情感痕迹值，从 CT 注释中导出句子级 (SL) 标签。这些来自最初在句子级别收集的标签的聚合 SL 标签有多相似？ MSP-Podcast（SL 注释）和 MSP-Conversation（CT 注释）语料库的发布为探索从 CT 注释聚合 SL 标签的有效性提供了资源。有 2,884 个语音片段属于这两个语料库。使用这个集合，本研究 (1) 使用统计指标比较两种类型的注释，(2) 评估它们的评估者间协议，(3) 探索这些 SL 标签对语音情感识别 (SER) 任务的影响。该分析揭示了在价估计中使用从 CT 注释衍生的 SL 标签的好处。此分析还提供了有关两种类型标签有何不同以及这如何影响模型的见解。

更新日期：2024-03-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南