当前位置: X-MOL 学术Clin. Orthop. Relat. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?
Clinical Orthopaedics and Related Research ( IF 4.2 ) Pub Date : 2024-09-06 , DOI: 10.1097/corr.0000000000003234
Tanios Dagher 1 , Emma P Dwyer , Hayden P Baker , Senthooran Kalidoss , Jason A Strelzow
Affiliation  

BACKGROUND Artificial intelligence (AI) is engineered to emulate tasks that have historically required human interaction and intellect, including learning, pattern recognition, decision-making, and problem-solving. Although AI models like ChatGPT-4 have demonstrated satisfactory performance on medical licensing exams, suggesting a potential for supporting medical diagnostics and decision-making, no study of which we are aware has evaluated the ability of these tools to make treatment recommendations when given clinical vignettes and representative medical imaging of common orthopaedic conditions. As AI continues to advance, a thorough understanding of its strengths and limitations is necessary to inform safe and helpful integration into medical practice. QUESTIONS/PURPOSES (1) What is the concordance between ChatGPT-4-generated treatment recommendations for common orthopaedic conditions with both the American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines (CPGs) and an orthopaedic attending physician's treatment plan? (2) In what specific areas do the ChatGPT-4-generated treatment recommendations diverge from the AAOS CPGs? METHODS Ten common orthopaedic conditions with associated AAOS CPGs were identified: carpal tunnel syndrome, distal radius fracture, glenohumeral joint osteoarthritis, rotator cuff injury, clavicle fracture, hip fracture, hip osteoarthritis, knee osteoarthritis, ACL injury, and acute Achilles rupture. For each condition, the medical records of 10 deidentified patients managed at our facility were used to construct clinical vignettes that each had an isolated, single diagnosis with adequate clarity. The vignettes also encompassed a range of diagnostic severity to evaluate more thoroughly adherence to the treatment guidelines outlined by the AAOS. These clinical vignettes were presented alongside representative radiographic imaging. The model was prompted to provide a single treatment plan recommendation. Each treatment plan was compared with established AAOS CPGs and to the treatment plan documented by the attending orthopaedic surgeon treating the specific patient. Vignettes where ChatGPT-4 recommendations diverged from CPGs were reviewed to identify patterns of error and summarized. RESULTS ChatGPT-4 provided treatment recommendations in accordance with the AAOS CPGs in 90% (90 of 100) of clinical vignettes. Concordance between ChatGPT-generated plans and the plan recommended by the treating orthopaedic attending physician was 78% (78 of 100). One hundred percent (30 of 30) of ChatGPT-4 recommendations for fracture vignettes and hip and knee arthritis vignettes matched with CPG recommendations, whereas the model struggled most with recommendations for carpal tunnel syndrome (3 of 10 instances demonstrated discordance). ChatGPT-4 recommendations diverged from AAOS CPGs for three carpal tunnel syndrome vignettes; two ACL injury, rotator cuff injury, and glenohumeral joint osteoarthritis vignettes; as well as one acute Achilles rupture vignette. In these situations, ChatGPT-4 most often struggled to correctly interpret injury severity and progression, incorporate patient factors (such as lifestyle or comorbidities) into decision-making, and recognize a contraindication to surgery. CONCLUSION ChatGPT-4 can generate accurate treatment plans aligned with CPGs but can also make mistakes when it is required to integrate multiple patient factors into decision-making and understand disease severity and progression. Physicians must critically assess the full clinical picture when using AI tools to support their decision-making. CLINICAL RELEVANCE ChatGPT-4 may be used as an on-demand diagnostic companion, but patient-centered decision-making should continue to remain in the hands of the physician.

中文翻译:


“AI 博士现在再见”:ChatGPT-4 治疗建议如何与骨科临床实践指南保持一致?



背景人工智能 (AI) 旨在模拟历史上需要人类交互和智力的任务,包括学习、模式识别、决策和解决问题。尽管像 ChatGPT-4 这样的 AI 模型在医疗执照考试中表现出令人满意的性能,这表明有可能支持医疗诊断和决策,但据我们所知,没有一项研究评估这些工具在给出常见骨科疾病的临床小插图和代表性医学影像时提出治疗建议的能力。随着 AI 的不断发展,必须全面了解其优势和局限性,以便安全、有益地融入医疗实践。问题/目的 (1) ChatGPT-4 生成的常见骨科疾病治疗建议与美国骨科医师学会 (AAOS) 临床实践指南 (CPG) 和骨科主治医师的治疗计划之间的一致性如何?(2) ChatGPT-4 生成的治疗建议在哪些特定领域与 AAOS CPG 不同?方法 确定了 10 种常见的骨科疾病与相关的 AAOS CPG: 腕管综合征、桡骨远端骨折、盂肱关节骨关节炎、肩袖损伤、锁骨骨折、髋部骨折、髋骨关节炎、膝骨关节炎、ACL 损伤和急性跟腱断裂。对于每种情况,使用在我们设施管理的 10 名去身份化患者的医疗记录来构建临床小插曲,每个小插曲都有一个足够清晰的孤立、单一的诊断。 这些小插图还包括一系列诊断严重程度,以更彻底地评估对 AAOS 概述的治疗指南的依从性。这些临床小插图与代表性的放射学成像一起呈现。提示模型提供单一治疗计划建议。将每个治疗计划与已建立的 AAOS CPG 和治疗特定患者的主治骨科医生记录的治疗计划进行比较。审查了 ChatGPT-4 建议与 CPG 不同的小插图,以确定错误模式并进行了总结。结果 ChatGPT-4 在 90%(100 个中的 90 个)的临床小插图中提供了符合 AAOS CPG 的治疗建议。ChatGPT 生成的计划与主治骨科主治医师推荐的计划之间的一致性为 78%(100 分中的 78 分)。ChatGPT-4 关于骨折小插图以及髋部和膝关节炎小插图的 100%(30 个中的 30 个)与 CPG 建议相匹配,而该模型在腕管综合征的建议方面最挣扎(10 个实例中有 3 个表现出不一致)。ChatGPT-4 建议在三个腕管综合征小插图上与 AAOS CPG 不同;2 例 ACL 损伤、肩袖损伤和盂肱关节骨关节炎小插图;以及 1 例急性跟腱破裂小插图。在这些情况下,ChatGPT-4 通常难以正确解释损伤的严重程度和进展,将患者因素(如生活方式或合并症)纳入决策,并识别手术的禁忌症。 结论 ChatGPT-4 可以生成与 CPG 一致的准确治疗计划,但在需要将多个患者因素纳入决策并了解疾病严重程度和进展时也会犯错误。医生在使用 AI 工具支持他们的决策时,必须批判性地评估完整的临床情况。临床相关性 ChatGPT-4 可以用作按需诊断伴侣,但以患者为中心的决策应继续掌握在医生手中。
更新日期:2024-09-06
down
wechat
bug