Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media,JAMA Oncology

当前位置： X-MOL 学术 › JAMA Oncol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media
JAMA Oncology ( IF 22.5 ) Pub Date : 2024-05-16 , DOI: 10.1001/jamaoncol.2024.0836
David Chen _{1,

2} , Rod Parsa _{1,

3} , Andrew Hope _{1,

4} , Breffni Hannon _{5,

6} , Ernie Mak _{5,

7} , Lawson Eng _{8,

9} , Fei-Fei Liu _{1,

4} , Nazanin Fallah-Rad ₈ , Ann M. Heesters _{10,

11,

12} , Srinivas Raman _{1,

4}

Affiliation

Princess Margaret Hospital Cancer Centre, Radiation Medicine Program, Toronto, Ontario, Canada
Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Ontario, Canada
Department of Radiation Oncology, University of Toronto, Toronto, Ontario, Canada
Department of Supportive Care, University Health Network, Toronto, Ontario, Canada
Department of Medicine, University of Toronto, Toronto, Ontario, Canada
Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
Division of Medical Oncology and Hematology, Department of Medicine, Princess Margaret Cancer Centre/University Health Network Toronto, Toronto, Ontario, Canada
Division of Medical Oncology, Department of Medicine, University of Toronto, Toronto, Ontario, Canada
Department of Clinical and Organizational Ethics, University Health Network, Toronto, Ontario, Canada
The Institute for Education Research, University Health Network, Toronto, Ontario, Canada
Dalla Lana School of Public Health and Joint Centre for Bioethics, University of Toronto, Toronto, Ontario, Canada

ImportanceArtificial intelligence (AI) chatbots pose the opportunity to draft template responses to patient questions. However, the ability of chatbots to generate responses based on domain-specific knowledge of cancer remains to be tested.ObjectiveTo evaluate the competency of AI chatbots (GPT-3.5 [chatbot 1], GPT-4 [chatbot 2], and Claude AI [chatbot 3]) to generate high-quality, empathetic, and readable responses to patient questions about cancer.Design, Setting, and ParticipantsThis equivalence study compared the AI chatbot responses and responses by 6 verified oncologists to 200 patient questions about cancer from a public online forum. Data were collected on May 31, 2023.ExposuresRandom sample of 200 patient questions related to cancer from a public online forum (Reddit r/AskDocs) spanning from January 1, 2018, to May 31, 2023, was posed to 3 AI chatbots.Main Outcomes and MeasuresThe primary outcomes were pilot ratings of the quality, empathy, and readability on a Likert scale from 1 (very poor) to 5 (very good). Two teams of attending oncology specialists evaluated each response based on pilot measures of quality, empathy, and readability in triplicate. The secondary outcome was readability assessed using Flesch-Kincaid Grade Level.ResultsResponses to 200 questions generated by chatbot 3, the best-performing AI chatbot, were rated consistently higher in overall measures of quality (mean, 3.56 [95% CI, 3.48-3.63] vs 3.00 [95% CI, 2.91-3.09]; P &lt; .001), empathy (mean, 3.62 [95% CI, 3.53-3.70] vs 2.43 [95% CI, 2.32-2.53]; P &lt; .001), and readability (mean, 3.79 [95% CI, 3.72-3.87] vs 3.07 [95% CI, 3.00-3.15]; P &lt; .001) compared with physician responses. The mean Flesch-Kincaid Grade Level of physician responses (mean, 10.11 [95% CI, 9.21-11.03]) was not significantly different from chatbot 3 responses (mean, 10.31 [95% CI, 9.89-10.72]; P &gt; .99) but was lower than those from chatbot 1 (mean, 12.33 [95% CI, 11.84-12.83]; P &lt; .001) and chatbot 2 (mean, 11.32 [95% CI, 11.05-11.79]; P = .01).Conclusions and RelevanceThe findings of this study suggest that chatbots can generate quality, empathetic, and readable responses to patient questions comparable to physician responses sourced from an online forum. Further research is required to assess the scope, process integration, and patient and physician outcomes of chatbot-facilitated interactions.

中文翻译：

医生和人工智能聊天机器人对社交媒体上的癌症问题的回应

重要性人工智能 (AI) 聊天机器人提供了起草针对患者问题的模板响应的机会。然而，聊天机器人根据特定领域的癌症知识生成响应的能力仍有待测试。目的评估 AI 聊天机器人（GPT-3.5 [聊天机器人 1]、GPT-4 [聊天机器人 2] 和 Claude AI [聊天机器人 3]）针对患者有关癌症的问题生成高质量、富有同理心且可读的答复。设计、设置和参与者这项等效研究将 AI 聊天机器人的响应以及 6 位经过验证的肿瘤学家的响应与来自公众在线的 200 个患者有关癌症的问题进行了比较论坛。数据收集于 2023 年 5 月 31 日。暴露从 2018 年 1 月 1 日至 2023 年 5 月 31 日的公共在线论坛 (Reddit r/AskDocs) 中随机抽取了 200 个与癌症相关的患者问题，并提交给 3 个人工智能聊天机器人。结果和测量主要结果是按李克特量表对质量、同理心和可读性进行试点评分，从 1（非常差）到 5（非常好）。由主治肿瘤学专家组成的两个团队根据质量、同理心和可读性的试点措施评估了每个响应，一式三份。次要结果是使用 Flesch-Kincaid 等级水平评估可读性。结果对聊天机器人 3（性能最佳的 AI 聊天机器人）生成的 200 个问题的回答在总体质量衡量方面始终获得较高评价（平均值为 3.56 [95% CI，3.48-3.63） ] vs 3.00 [95% CI, 2.91-3.09]；同理心（平均值，3.62 [95% CI, 3.53-3.70] vs 2.43 [95% CI, 2.32-2.53]；P < . 001）和可读性（平均值，3.79 [95% CI，3.72-3.87] vs 3.07 [95% CI，3.00-3.15]；P < .001）与医生的反应进行比较。医生反应的平均 Flesch-Kincaid 等级（平均值，10.11 [95% CI，9.21-11。03]）与聊天机器人 3 的响应没有显着差异（平均值，10.31 [95% CI，9.89-10.72]；P > .99），但低于聊天机器人 1 的响应（平均值，12.33 [95% CI，11.84- 12.83]；P < .001）和聊天机器人 2（平均值，11.32 [95% CI，11.05-11.79]；P = .01）。结论和相关性本研究的结果表明，聊天机器人可以生成高质量、善解人意且可读的内容对患者问题的答复与来自在线论坛的医生答复相当。需要进一步研究来评估聊天机器人促进的交互的范围、流程集成以及患者和医生的结果。

更新日期：2024-05-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>