当前位置: X-MOL 学术Radiology › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Treasure Trove Hidden in Plain Sight: The Utility of GPT-4 in Chest Radiograph Evaluation.
Radiology ( IF 12.1 ) Pub Date : 2024-11-01 , DOI: 10.1148/radiol.233441
Soroosh Tayebi Arasteh,Robert Siepmann,Marc Huppertz,Mahshad Lotfinia,Behrus Puladi,Christiane Kuhl,Daniel Truhn,Sven Nebelung

Background Limited statistical knowledge can slow critical engagement with and adoption of artificial intelligence (AI) tools for radiologists. Large language models (LLMs) such as OpenAI's GPT-4, and notably its Advanced Data Analysis (ADA) extension, may improve the adoption of AI in radiology. Purpose To validate GPT-4 ADA outputs when autonomously conducting analyses of varying complexity on a multisource clinical dataset. Materials and Methods In this retrospective study, unique itemized radiologic reports of bedside chest radiographs, associated demographic data, and laboratory markers of inflammation from patients in intensive care from January 2009 to December 2019 were evaluated. GPT-4 ADA, accessed between December 2023 and January 2024, was tasked with autonomously analyzing this dataset by plotting radiography usage rates, providing descriptive statistics measures, quantifying factors of pulmonary opacities, and setting up machine learning (ML) models to predict their presence. Three scientists with 6-10 years of ML experience validated the outputs by verifying the methodology, assessing coding quality, re-executing the provided code, and comparing ML models head-to-head with their human-developed counterparts (based on the area under the receiver operating characteristic curve [AUC], accuracy, sensitivity, and specificity). Statistical significance was evaluated using bootstrapping. Results A total of 43 788 radiograph reports, with their laboratory values, from University Hospital RWTH Aachen were evaluated from 43 788 patients (mean age, 66 years ± 15 [SD]; 26 804 male). While GPT-4 ADA provided largely appropriate visualizations, descriptive statistical measures, quantitative statistical associations based on logistic regression, and gradient boosting machines for the predictive task (AUC, 0.75), some statistical errors and inaccuracies were encountered. ML strategies were valid and based on consistent coding routines, resulting in valid outputs on par with human specialist-developed reference models (AUC, 0.80 [95% CI: 0.80, 0.81] vs 0.80 [95% CI: 0.80, 0.81]; P = .51) (accuracy, 79% [6910 of 8758 patients] vs 78% [6875 of 8758 patients], respectively; P = .27). Conclusion LLMs may facilitate data analysis in radiology, from basic statistics to advanced ML-based predictive modeling. © RSNA, 2024 Supplemental material is available for this article.

中文翻译:


隐藏在众目睽睽之下的宝库:GPT-4 在胸片评估中的效用。



背景 有限的统计知识会减慢放射科医生对人工智能 (AI) 工具的批判性参与和采用。OpenAI 的 GPT-4 等大型语言模型 (LLMs) 以及其高级数据分析 (ADA) 扩展,可能会提高 AI 在放射学中的采用。目的 在多源临床数据集上自主进行不同复杂性的分析时验证 GPT-4 ADA 输出。材料和方法 在这项回顾性研究中,评估了 2009 年 1 月至 2019 年 12 月重症监护患者床旁胸部 X 光片的独特逐项放射学报告、相关人口统计数据和炎症实验室标志物。GPT-4 ADA 于 2023 年 12 月至 2024 年 1 月期间访问,其任务是通过绘制射线照相使用率、提供描述性统计措施、量化肺混浊因素以及设置机器学习 (ML) 模型来预测它们的存在来自主分析该数据集。三位具有 6-10 年 ML 经验的科学家通过验证方法、评估编码质量、重新执行提供的代码以及将 ML 模型与人工开发的对应模型(基于接受者工作特征曲线下面积 [AUC]、准确性、敏感性和特异性)进行头对头比较来验证输出。使用 bootstrap 评估统计显着性。结果 从 43 788 例患者 (平均年龄 66 岁 ± 15 [SD];26 804 名男性)中评估了来自亚琛工业大学医院的 43 788 份 X 光片报告及其实验室值。 虽然 GPT-4 ADA 提供了基本适当的可视化、描述性统计措施、基于逻辑回归的定量统计关联以及用于预测任务的梯度提升机 (AUC, 0.75),但遇到了一些统计错误和不准确之处。ML 策略是有效的,并且基于一致的编码程序,从而产生与人类专家开发的参考模型相当的有效输出(AUC,0.80 [95% CI:0.80,0.81] vs 0.80 [95% CI:0.80,0.81];P = .51) (准确率,分别为 79% [8758 名患者中的 6910 名] 和 78% [8758 名患者中的 6875 名] ;P = .27)。结论 LLMs 可能有助于放射学数据分析,从基本统计到基于 ML 的高级预测建模。© RSNA,2024 年本文提供补充材料。
更新日期:2024-11-01
down
wechat
bug