Representation of intensivists’ race/ethnicity, sex, and age by artificial intelligence: a cross-sectional study of two text-to-image models,Critical Care

当前位置： X-MOL 学术 › Crit. Care › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Representation of intensivists’ race/ethnicity, sex, and age by artificial intelligence: a cross-sectional study of two text-to-image models
Critical Care ( IF 8.8 ) Pub Date : 2024-11-11 , DOI: 10.1186/s13054-024-05134-4
Mia Gisselbaek, Mélanie Suppan, Laurens Minsart, Ekin Köselerli, Sheila Nainan Myatra, Idit Matot, Odmara L. Barreto Chang, Sarah Saxena, Joana Berger-Estilita

Integrating artificial intelligence (AI) into intensive care practices can enhance patient care by providing real-time predictions and aiding clinical decisions. However, biases in AI models can undermine diversity, equity, and inclusion (DEI) efforts, particularly in visual representations of healthcare professionals. This work aims to examine the demographic representation of two AI text-to-image models, Midjourney and ChatGPT DALL-E 2, and assess their accuracy in depicting the demographic characteristics of intensivists. This cross-sectional study, conducted from May to July 2024, used demographic data from the USA workforce report (2022) and intensive care trainees (2021) to compare real-world intensivist demographics with images generated by two AI models, Midjourney v6.0 and ChatGPT 4.0 DALL-E 2. A total of 1,400 images were generated across ICU subspecialties, with outcomes being the comparison of sex, race/ethnicity, and age representation in AI-generated images to the actual workforce demographics. The AI models demonstrated noticeable biases when compared to the actual U.S. intensive care workforce data, notably overrepresenting White and young doctors. ChatGPT-DALL-E2 produced less female (17.3% vs 32.2%, p < 0.0001), more White (61% vs 55.1%, p = 0.002) and younger (53.3% vs 23.9%, p < 0.001) individuals. While Midjourney depicted more female (47.6% vs 32.2%, p < 0.001), more White (60.9% vs 55.1%, p = 0.003) and younger intensivist (49.3% vs 23.9%, p < 0.001). Substantial differences between the specialties within both models were observed. Finally when compared together, both models showed significant differences in the Portrayal of intensivists. Significant biases in AI images of intensivists generated by ChatGPT DALL-E 2 and Midjourney reflect broader cultural issues, potentially perpetuating stereotypes of healthcare worker within the society. This study highlights the need for an approach that ensures fairness, accountability, transparency, and ethics in AI applications for healthcare.

中文翻译：

人工智能表示重症监护医师的种族/民族、性别和年龄：两种文本到图像模型的横断面研究

将人工智能（AI）集成到重症监护实践中，可以通过提供实时预测和协助临床决策来增强患者护理。然而，AI 模型中的偏见会破坏多样性、公平性和包容性（DEI）工作，尤其是在医疗保健专业人员的视觉表示方面。这项工作旨在检查两个 AI 文本到图像模型 Midjourney 和 ChatGPT DALL-E 2 的人口统计表示，并评估它们在描绘重症监护医师人口统计特征方面的准确性。这项横断面研究于 2024 年 5 月至 7 月进行，使用来自美国劳动力报告（2022 年）和重症监护实习生（2021 年）的人口统计数据，将现实世界的重症监护人口统计数据与两个 AI 模型 Midjourney v6.0 和 ChatGPT 4.0 DALL-E 2 生成的图像进行比较。ICU 亚专科共生成了 1,400 张图像，结果是将 AI 生成图像中的性别、种族/民族和年龄表示与实际劳动力人口统计数据进行比较。与美国重症监护人员的实际数据相比，AI 模型表现出明显的偏差，尤其是白人和年轻医生的比例过高。ChatGPT-DALL-E2 产生的女性个体较少（17.3% 对 32.2%，p < 0.0001），更多的白人（61% 对 55.1%，p = 0.002）和更年轻（53.3% 对 23.9%，p < 0.001）个体。而 Midjourney 描绘了更多的女性（47.6% 对 32.2%，p < 0.001），更多的白人（60.9% 对 55.1%，p = 0.003）和更年轻的重症监护医师（49.3% 对 23.9%，p < 0.001）。观察到两种模型中的专业之间存在显着差异。最后，当一起比较时，两个模型在重症监护医师的描绘方面显示出显着差异。 ChatGPT DALL-E 2 和 Midjourney 生成的重症监护医师 AI 图像中存在重大偏差，反映了更广泛的文化问题，可能会使社会中对医护人员的刻板印象永久化。这项研究强调了需要一种方法来确保医疗保健 AI 应用程序的公平性、问责制、透明度和道德规范。

更新日期：2024-11-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南