Are automated video interviews smart enough? Behavioral modes, reliability, validity, and bias of machine learning cognitive ability assessments.,Journal of Applied Psychology

当前位置： X-MOL 学术 › Journal of Applied Psychology › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Are automated video interviews smart enough? Behavioral modes, reliability, validity, and bias of machine learning cognitive ability assessments.
Journal of Applied Psychology ( IF 9.4 ) Pub Date : 2024-09-26 , DOI: 10.1037/apl0001236
Louis Hickman,Louis Tay,Sang Eun Woo

Automated video interviews (AVIs) that use machine learning (ML) algorithms to assess interviewees are increasingly popular. Extending prior AVI research focusing on noncognitive constructs, the present study critically evaluates the possibility of assessing cognitive ability with AVIs. By developing and examining AVI ML models trained to predict measures of three cognitive ability constructs (i.e., general mental ability, verbal ability, and intellect [as observed at zero acquaintance]), this research contributes to the literature in several ways. First, it advances our understanding of how cognitive abilities relate to interviewee behavior. Specifically, we found that verbal behaviors best predicted interviewee cognitive abilities, while neither paraverbal nor nonverbal behaviors provided incremental validity, suggesting that only verbal behaviors should be used to assess cognitive abilities. Second, across two samples of mock video interviews, we extensively evaluated the psychometric properties of the verbal behavior AVI ML model scores, including their reliability (internal consistency across interview questions and test-retest), validity (relationships with other variables and content), and fairness and bias (measurement and predictive). Overall, the general mental ability, verbal ability, and intellect AVI models captured similar behavioral manifestations of cognitive ability. Validity evidence results were mixed: For example, AVIs trained on observer-rated intellect exhibited superior convergent and criterion relationships (compared to the observer ratings they were trained to model) but had limited discriminant validity evidence. Our findings illustrate the importance of examining psychometric properties beyond convergence with the test that ML algorithms are trained to model. We provide recommendations for enhancing discriminant validity evidence in future AVIs. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

中文翻译：

自动视频面试足够智能吗？机器学习认知能力评估的行为模式、可靠性、有效性和偏差。

使用机器学习（ML）算法评估受访者的自动视频访谈（AVI）越来越受欢迎。本研究扩展了先前专注于非认知结构的 AVI 研究，批判性地评估了使用 AVI 评估认知能力的可能性。通过开发和检查经过训练的 AVI ML 模型来预测三种认知能力结构（即一般心理能力、语言能力和智力 [在零熟人时观察到的]）的测量，这项研究以多种方式为文献做出了贡献。首先，它促进了我们对认知能力与受访者行为之间的关系的理解。具体来说，我们发现语言行为最能预测受访者的认知能力，而副语言和非语言行为都没有提供增量效度，这表明应该只使用语言行为来评估认知能力。其次，在两个模拟视频访谈样本中，我们广泛评估了言语行为 AVI ML 模型分数的心理测量特性，包括它们的可靠性（访谈问题和重测之间的内部一致性）、有效性（与其他变量和内容的关系）以及公平性和偏见（测量和预测）。总体而言，一般心理能力、语言能力和智力 AVI 模型捕捉了认知能力的相似行为表现。有效性证据结果喜忧参半：例如，根据观察者评分的智力训练的 AVI 表现出卓越的收敛和标准关系（与它们接受训练建模的观察者评分相比），但判别有效性证据有限。我们的研究结果表明，除了与 ML 算法经过训练建模的测试收敛之外，检查心理测量特性的重要性。我们为增强未来 AVI 中的判别有效性证据提供了建议。（PsycInfo 数据库记录（c） 2024 APA，保留所有权利）。

更新日期：2024-09-26

点击分享查看原文

点击收藏

阅读更多本刊新发论文