Is ChatGPT-4 a Reliable Tool in Autoimmune Hepatitis?,The American Journal of Gastroenterology

当前位置： X-MOL 学术 › Am. J. Gastroenterol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Is ChatGPT-4 a Reliable Tool in Autoimmune Hepatitis?
The American Journal of Gastroenterology ( IF 8.0 ) Pub Date : 2024-10-31 , DOI: 10.14309/ajg.0000000000003179
Francesca Colapietro,Daniele Piovani,Nicola Pugliese,Alessio Aghemo,Vincenzo Ronca,Ana Lleo

INTRODUCTION Artificial intelligence-based chatbots offer a potential avenue for delivering personalized counseling to patients with autoimmune hepatitis. We assessed accuracy, completeness, comprehensiveness, and safety of Chat Generative Pretrained Transformer-4 responses to 12 inquiries out of a pool of 40 questions posed by 4 patients with autoimmune hepatitis. METHODS Questions were categorized into 3 areas: diagnosis (1-3), quality of life (4-8), and medical treatment (9-12). 11 key opinion leaders evaluated responses using a Likert scale with 6 points for accuracy, 5 points for safety, and 3 points for completeness and comprehensiveness. RESULTS Median scores for accuracy, completeness, comprehensiveness, and safety were 5 (4-6), 2 (2-2), and 3 (2-3), respectively; no domain exhibited superior evaluation. Postdiagnosis follow-up question was the trickiest with low accuracy and completeness, but safe and comprehensive features. Agreement among key opinion leaders (Fleiss Kappa statistics) was slight for the accuracy (0.05) but poor for the remaining features (-0.05, -0.06, and -0.02, respectively). DISCUSSION Chatbots show good comprehensibility, but lack reliability. Further studies are needed to integrate Chat Generative Pretrained Transformer within clinical practice.

中文翻译：

ChatGPT-4 是治疗自身免疫性肝炎的可靠工具吗？

引言基于人工智能的聊天机器人为自身免疫性肝炎患者提供个性化咨询提供了一条潜在的途径。我们评估了 Chat Generative Pretrained Transformer-4 对 4 名自身免疫性肝炎患者提出的 40 个问题中的 12 个询问的准确性、完整性、全面性和安全性。方法问题分为 3 个领域：诊断（1-3）、生活质量（4-8）和药物治疗（9-12）。11 位关键意见领袖使用李克特量表评估回答，准确性为 6 分，安全性为 5 分，完整性和全面性为 3 分。结果准确性、完整性、全面性和安全性的中位数得分分别为 5 （4-6）、 2 （2-2）和 3 （2-3）;没有领域表现出卓越的评价。诊断后随访问题最棘手，准确性和完整性低，但功能安全全面。关键意见领袖（Fleiss Kappa 统计）之间的一致性在准确性（0.05）方面很差，但对其余特征（分别为 -0.05、-0.06 和 -0.02）很差。讨论聊天机器人表现出良好的可理解性，但缺乏可靠性。需要进一步的研究才能将 Chat Generative Pretrained Transformer 整合到临床实践中。

更新日期：2024-10-31

点击分享查看原文

点击收藏

阅读更多本刊新发论文