An Open-Source Benchmark of Deep Learning Models for Audio-Visual Apparent and Self-Reported Personality Recognition,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Open-Source Benchmark of Deep Learning Models for Audio-Visual Apparent and Self-Reported Personality Recognition
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2024-02-08 , DOI: 10.1109/taffc.2024.3363710
Rongfan Liao ₁ , Siyang Song ₂ , Hatice Gunes ₂

Affiliation

Personality determines various human daily and working behaviours. Recently, a large number of automatic personality computing approaches have been developed to predict either the apparent or self-reported personality of the subject based on non-verbal audio-visual behaviours. However, most of them suffer from complex and dataset-specific pre-processing steps and model training tricks. In the absence of a standardized benchmark with consistent experimental settings, it is not only impossible to fairly compare the real performances of these personality computing models but also makes them difficult to be reproduced. This paper presents the first reproducible audio-visual benchmark to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks. Building upon a set of benchmarked models, we also investigate the impact of two previously-used long-term modelling strategies for summarising short-term/frame-level predictions on personality computing results. We comprehensively investigate all benchmarked models on two publicly available datasets, ChaLearn First Impression and UDIVA self-reported personality datasets, and conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; (iii) non-verbal behaviours contribute differently in predicting different personality traits; and (iv) our reproduced personality computing models generally achieved worse performances than their original reported results.

中文翻译：

用于视听表象和自我报告人格识别的深度学习模型的开源基准

人格决定了人类的各种日常和工作行为。最近，已经开发出大量自动个性计算方法来基于非语言视听行为来预测对象的表观或自我报告的个性。然而，它们中的大多数都受到复杂且特定于数据集的预处理步骤和模型训练技巧的困扰。在缺乏具有一致实验设置的标准化基准的情况下，不仅无法公平地比较这些个性计算模型的真实性能，而且也使得它们难以重现。本文提出了第一个可重复的视听基准，为八个现有的个性计算模型（例如音频、视觉和视听）和七个标准深度学习模型在自我报告和表观个性识别任务上提供公平和一致的评估。在一组基准模型的基础上，我们还研究了两种先前使用的长期建模策略的影响，以总结短期/帧级预测对个性计算结果的影响。我们全面调查了两个公开可用的数据集（ChaLearn 第一印象和 UDIVA 自我报告人格数据集）上的所有基准模型，并得出结论：（i）大多数基准深度学习模型从面部行为推断出的明显人格特征比自我报告更可靠。被举报的； (ii) 在个性识别方面，视觉模型经常比音频模型取得更好的性能； (iii) 非语言行为对预测不同人格特质的贡献不同；（iv）我们复制的个性计算模型通常比最初报告的结果表现更差。

更新日期：2024-02-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南