当前位置: X-MOL 学术Journal of Applied Psychology › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated speech recognition bias in personnel selection: The case of automatically scored job interviews.
Journal of Applied Psychology ( IF 9.4 ) Pub Date : 2024-10-31 , DOI: 10.1037/apl0001247
Louis Hickman,Markus Langer,Rachel M Saef,Louis Tay

Organizations, researchers, and software increasingly use automatic speech recognition (ASR) to transcribe speech to text. However, ASR can be less accurate for (i.e., biased against) certain demographic subgroups. This is concerning, given that the machine-learning (ML) models used to automatically score video interviews use ASR transcriptions of interviewee responses as inputs. To address these concerns, we investigate the extent of ASR bias and its effects in automatically scored interviews. Specifically, we compare the accuracy of ASR transcription for English as a second language (ESL) versus non-ESL interviewees, people of color (and Black interviewees separately) versus White interviewees, and male versus female interviewees. Then, we test whether ASR bias causes bias in ML model scores-both in terms of differential convergent correlations (i.e., subgroup differences in correlations between observed and ML scores) and differential means (i.e., shifts in subgroup differences from observed to ML scores). To do so, we apply one human and four ASR transcription methods to two samples of mock video interviews (Ns = 1,014 and 414), and then we train and test models using these different transcripts to score multiple constructs. We observed significant bias in the commercial ASR services across nearly all comparisons, with the magnitude of bias differing across the ASR services. However, the transcription bias did not translate into meaningful measurement bias for the ML interview scores-whether in terms of differential convergent correlations or means. We discuss what these results mean for the nature of bias, fairness, and validity of ML models for scoring verbal open-ended responses. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

中文翻译:


人员选择中的自动语音识别偏差:自动评分工作面试的情况。



组织、研究人员和软件越来越多地使用自动语音识别 (ASR) 将语音转录为文本。然而,ASR 对于某些人口统计亚组可能不太准确(即偏向于某些人口统计亚组)。这令人担忧,因为用于自动对视频采访进行评分的机器学习 (ML) 模型使用受访者回答的 ASR 转录作为输入。为了解决这些问题,我们调查了自动评分访谈中 ASR 偏倚的程度及其影响。具体来说,我们比较了英语作为第二语言 (ESL) 与非 ESL 受访者、有色人种(和黑人受访者)与白人受访者以及男性与女性受访者的 ASR 转录的准确性。然后,我们测试 ASR 偏差是否会导致 ML 模型分数的偏差——无论是在差异收敛相关性(即,观察到的分数和 ML 分数之间相关性的亚组差异)还是差异平均值(即,从观察到的亚组差异到 ML 分数的转变)。为此,我们将一个人类和四个 ASR 转录方法应用于两个模拟视频访谈样本(Ns = 1,014 和 414),然后我们使用这些不同的转录来训练和测试模型,以对多个结构进行评分。在几乎所有的比较中,我们观察到商业 ASR 服务存在显著的偏差,不同 ASR 服务的偏差幅度不同。然而,转录偏倚并未转化为 ML 访谈分数的有意义的测量偏倚——无论是在差异收敛相关性还是均值方面。我们讨论了这些结果对用于对口头开放式回答进行评分的 ML 模型的偏差性质、公平性和有效性意味着什么。(PsycInfo 数据库记录 (c) 2024 APA,保留所有权利)。
更新日期:2024-10-31
down
wechat
bug