当前位置: X-MOL 学术Br. J. Ophthalmol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large language models: a new frontier in paediatric cataract patient education
British Journal of Ophthalmology ( IF 3.7 ) Pub Date : 2024-10-01 , DOI: 10.1136/bjo-2024-325252
Qais Dihan 1, 2 , Muhammad Z Chauhan 2 , Taher K Eleiwa 3 , Andrew D Brown 4 , Amr K Hassan 5 , Mohamed M Khodeiry 6 , Reem H Elsheikh 2 , Isdin Oke 7 , Bharti R Nihalani 7 , Deborah K VanderVeen 7 , Ahmed B Sallam 2 , Abdelrahman M Elhusseiny 7, 8
Affiliation  

Background/aims This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract. Methods We compared LLMs’ responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was ‘easily understandable by an average American.’ Prompt B modified prompt A and requested the handout be written at a ‘sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.’ Prompt C rewrote existing PEMs on paediatric cataract ‘to a sixth-grade reading level using the SMOG readability formula’. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable). Results All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3). Conclusion LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract. Data are available on reasonable request. Data to be made are available on reasonable request.

中文翻译:


大语言模型:小儿白内障患者教育的新领域



背景/目标 这是一项横断面比较研究。我们评估了三种大型语言模型 ( LLMs )(ChatGPT-3.5、ChatGPT-4 和 Google Bard)生成新型患者教育材料 (PEM) 并提高现有 PEM 关于儿童白内障的可读性的能力。方法 我们比较了LLMs对三个提示的反应。提示 A 要求他们写一份关于儿童白内障的讲义,“普通美国人很容易理解”。提示 B 修改了提示 A,并要求以“六年级阅读水平,使用简单测量官方语言 (SMOG) 可读性公式”编写讲义。 Prompt C 将现有的儿童白内障 PEM 重写为“使用 SMOG 可读性公式达到六年级阅读水平”。比较答复的质量(DISCERN;1(低质量)至 5(高质量))、可理解性和可操作性(患者教育材料评估工具(≥70%:可理解,≥70%:可操作))、准确性(李克特错误信息) ;1(无错误信息)至 5(高度错误信息)和可读性(SMOG、Flesch-Kincaid 等级水平 (FKGL);等级水平 <7:高度可读) 结果 所有LLM生成的回复都是高质量的(中值 DISCERN ≥)。 4)、可理解性 (≥70%) 和准确性 (Likert=1)。所有LLM生成的响应均不可操作 (<70%),ChatGPT-3.5 和 ChatGPT-4 提示 B 响应比提示 A 响应更具可读性( ChatGPT-4 生成的响应比其他两个LLMs (p<0.001) 更具可读性(较低的 SMOG 和 FKGL 分数;分别为 5.59±0.5 和 4.31±0.7),并且始终将其重写为或更低。指定的六年级阅读水平(烟雾:5.14±0.3)。 结论LLMs ,特别是 ChatGPT-4,在生成高质量、可读、准确的 PEM 和提高现有儿童白内障材料的可读性方面具有重要价值。可根据合理要求提供数据。待制作的数据可根据合理要求提供。
更新日期:2024-09-20
down
wechat
bug