Nature Human Behaviour ( IF 21.4 ) Pub Date : 2024-10-28 , DOI: 10.1038/s41562-024-02024-1 Michelle Vaccaro, Abdullah Almaatouq, Thomas Malone
Inspired by the increasing use of artificial intelligence (AI) to augment humans, researchers have studied human–AI systems involving different tasks, systems and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here we addressed this question by conducting a preregistered systematic review and meta-analysis of 106 experimental studies reporting 370 effect sizes. We searched an interdisciplinary set of databases (the Association for Computing Machinery Digital Library, the Web of Science and the Association for Information Systems eLibrary) for studies published between 1 January 2020 and 30 June 2023. Each study was required to include an original human-participants experiment that evaluated the performance of humans alone, AI alone and human–AI combinations. First, we found that, on average, human–AI combinations performed significantly worse than the best of humans or AI alone (Hedges’ g = −0.23; 95% confidence interval, −0.39 to −0.07). Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when AI outperformed humans alone, we found losses. Limitations of the evidence assessed here include possible publication bias and variations in the study designs analysed. Overall, these findings highlight the heterogeneity of the effects of human–AI collaboration and point to promising avenues for improving human–AI systems.
中文翻译:
当人类和 AI 的组合有用时:系统评价和荟萃分析
受到越来越多地使用人工智能 (AI) 来增强人类能力的启发,研究人员研究了涉及不同任务、系统和人群的人类-人工智能系统。尽管有如此大量的工作,但我们缺乏广泛的概念理解,即人类和 AI 的组合何时比单独使用任何一个更好。在这里,我们通过对 106 项报告了 370 种效应量的实验研究进行预先注册的系统评价和荟萃分析来解决这个问题。我们检索了一组跨学科数据库(计算机协会数字图书馆、Web of Science 和信息系统协会电子图书馆),以查找 2020 年 1 月 1 日至 2023 年 6 月 30 日期间发表的研究。每项研究都需要包括一个原始的人类参与者实验,该实验评估了单独人类、单独 AI 和人类-AI 组合的表现。首先,我们发现,平均而言,人类-AI 组合的表现明显差于最好的人类或单独的 AI(Hedges' g = -0.23;95% 置信区间,-0.39 至 -0.07)。其次,我们发现涉及决策的任务的性能损失,而涉及创建内容的任务的收益明显更大。最后,当人类的表现优于单独的 AI 时,我们发现组合的性能有所提升,但当 AI 的表现优于单独的人类时,我们发现了损失。这里评估的证据的局限性包括可能的发表偏倚和所分析的研究设计的变化。总体而言,这些发现突出了人类-人工智能协作效果的异质性,并指出了改进人类-人工智能系统的有希望的途径。