当前位置: X-MOL 学术JAMA Surg. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication
JAMA Surgery ( IF 15.7 ) Pub Date : 2024-06-05 , DOI: 10.1001/jamasurg.2024.1621
Philip Chung 1 , Christine T Fong 2 , Andrew M Walters 2 , Nima Aghaeepour 1 , Meliha Yetisgen 3, 4 , Vikas N O'Reilly-Shah 2
Affiliation  

ImportanceGeneral-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient’s electronic health record notes.ObjectiveTo examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration.Design, Setting, and ParticipantsThis prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023.ExposuresCompared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies.Main Outcomes and MeasuresF1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes.ResultsStudy results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction.Conclusions and RelevanceCurrent general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.

中文翻译:


围手术期风险预测和预测中的大型语言模型功能



重要性通用领域大语言模型可能能够使用手术描述和患者的电子健康记录笔记来执行风险分层并预测术后结果测量。 目的检查 8 种不同任务的预测性能:预测美国麻醉医师协会的身体状况( ASA-PS)、入院、重症监护病房(ICU)入院、计划外入院、住院死亡率、麻醉后监护病房(PACU)第 1 阶段持续时间、住院时间和 ICU 持续时间。设计、设置和参与者这项预后研究包括任务-根据常规临床护理期间收集的 2 年回顾性电子健康记录数据构建的特定数据集。案例和注释数据被格式化为提示,并提供给大型语言模型 GPT-4 Turbo (OpenAI) 以生成预测和解释。该设施包括一个四级护理中心,由位于一个大都市区的 3 家学术医院和附属诊所组成。该研究包括接受过手术或麻醉程序且在手术前在电子健康记录中至少提交一份临床医生书面记录的患者。数据分析时间为 2023 年 11 月至 12 月。暴露比较原始笔记、笔记摘要、几次提示和思维链提示策略。主要结果和测量二元结果和分类结果的 F1 分数。数字持续时间结果的平均绝对误差。结果研究结果是在特定任务数据集上测量的,每个数据集有 1000 例,但计划外入院(949 例)和住院死亡率(576 例)除外。每项任务的最佳结果包括 ASA-PS 的 F1 分数为 0.50(95% CI,0.47-0.53)、0.64(95% CI,0.61-0)。67)入院预测为 0.81(95% CI,0.78-0.83),入 ICU 为 0.61(95% CI,0.58-0.64),非计划入院为 0.61(95% CI,0.58-0.64),医院死亡率预测为 0.86(95% CI,0.83-0.89)。在所有提示策略中,持续时间预测任务的表现普遍较差,对于 PACU 第 1 阶段持续时间 4.5 天(95% CI,4.2),大型语言模型的平均绝对误差为 49 分钟(95% CI,46-51 分钟)。 -5.0 天)用于住院时间预测,1.1 天(95% CI,0.9-1.3 天)用于 ICU 持续时间预测。结论和相关性当前的通用域大语言模型可以帮助临床医生在分类任务上进行围手术期风险分层,但不足以进行数值计算持续时间预测。它们为预测提供高质量自然语言解释的能力可能使它们成为临床工作流程中的有用工具,并可能对传统风险预测模型进行补充。
更新日期:2024-06-05
down
wechat
bug