Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4,Hepatology

当前位置： X-MOL 学术 › Hepatology › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4
Hepatology ( IF 12.9 ) Pub Date : 2024-10-08 , DOI: 10.1097/hep.0000000000001115
Aryana Far, Asal Bastani, Albert Lee, Oksana Gologorskaya, Chiung-Yu Huang, Mark J. Pletcher, Jennifer C. Lai, Jin Ge

Background: Diagnosis code classification is a common method for cohort identification in cirrhosis research, but it is often inaccurate and augmented by labor-intensive chart review. Natural language processing (NLP) using large language models (LLMs) is a potentially more accurate method. To assess LLMs’ potential for cirrhosis cohort identification, we compared code-based versus LLM-based classification with chart review as a “gold standard.” Methods: We extracted and conducted a limited chart review of 3,788 discharge summaries of cirrhosis admissions. We engineered zero-shot prompts using Generative Pre-trained Transformer (GPT)-4 to determine whether cirrhosis and its complications were active hospitalization problems. We calculated positive predictive values (PPVs) of LLM-based classification versus limited chart review, and PPVs of code-based versus LLM-based classification as a “silver standard” in all 3,788 summaries. Results: Versus gold standard chart review, code-based classification achieved PPVs of 82.2% for identifying cirrhosis, 41.7% hepatic encephalopathy, 72.8% ascites, 59.8% gastrointestinal bleeding, and 48.8% spontaneous bacterial peritonitis. Compared to chart review, GPT-4 achieved 87.8-98.8% accuracies for identifying cirrhosis and its complications. Using LLM as a silver standard, code-based classification achieved PPVs of 79.8% for identifying cirrhosis, 53.9% hepatic encephalopathy, 55.3% ascites, 67.6% gastrointestinal bleeding, and 65.5% spontaneous bacterial peritonitis. Conclusions: LLM-based classification was highly accurate versus manual chart review in identifying cirrhosis and its complications – this allowed us to assess the performance of code-based classification at scale using LLMs as a silver standard. These results suggest LLMs could augment or replace code-based cohort classification and raise questions regarding the necessity of chart review.

中文翻译：

利用 GPT-4 评估基于代码的肝硬化及其并发症识别的阳性预测价值

背景：诊断代码分类是肝硬化研究中队列识别的常用方法，但通常不准确，并且会因劳动密集型图表审查而得到增强。使用大型语言模型（LLMs）可能是一种更准确的方法。为了评估 LLMs 在肝硬化队列识别方面的潜力，我们将基于代码的分类与基于 LLM 的分类与图表审查作为“金标准”进行了比较。方法：我们提取并对 3,788 份肝硬化入院总结进行了有限的图表回顾。我们使用生成式预训练转换器（GPT）-4 设计了零镜头提示，以确定肝硬化及其并发症是否是活动性住院问题。我们计算了 LLM以及基于代码与 LLM，作为所有 3,788 篇摘要中的“银标准”。结果：与金标准图表审查相比，基于代码的分类在识别肝硬化、肝性脑病、腹水、消化道出血和 48.8% 自发性细菌性腹膜炎方面的 PPV 为 82.2%，肝性脑病为 41.7%。与图表审查相比，GPT-4 在识别肝硬化及其并发症方面取得了 87.8-98.8% 的准确率。使用 LLM 作为银标准，基于代码的分类在识别肝硬化、肝性脑病、腹水、消化道出血 67.6% 和 65.5% 自发性细菌性腹膜炎方面实现了 79.8% 的 PPV、53.9% 的肝性脑病、55.3% 的腹水、67.6% 的消化道出血和 65.5% 的自发性细菌性腹膜炎。结论：LLM法学硕士的分类在识别肝硬化及其并发症方面比手动图表审查更准确——这使我们能够使用 LLMs。这些结果表明LLMs 可以增强或取代基于代码的队列分类，并引发有关图表审查必要性的问题。

更新日期：2024-10-08

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南