npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-08-24 , DOI: 10.1038/s41746-024-01219-0 Daniel Reichenpfader 1, 2 , Henning Müller 3, 4 , Kerstin Denecke 1
|
Radiological imaging is a globally prevalent diagnostic method, yet the free text contained in radiology reports is not frequently used for secondary purposes. Natural Language Processing can provide structured data retrieved from these reports. This paper provides a summary of the current state of research on Large Language Model (LLM) based approaches for information extraction (IE) from radiology reports. We conduct a scoping review that follows the PRISMA-ScR guideline. Queries of five databases were conducted on August 1st 2023. Among the 34 studies that met inclusion criteria, only pre-transformer and encoder-based models are described. External validation shows a general performance decrease, although LLMs might improve generalizability of IE approaches. Reports related to CT and MRI examinations, as well as thoracic reports, prevail. Most common challenges reported are missing validation on external data and augmentation of the described methods. Different reporting granularities affect the comparability and transparency of approaches.
中文翻译:

基于大型语言模型的放射学报告信息提取方法的范围综述
放射成像是一种全球流行的诊断方法,但放射学报告中包含的自由文本并不经常用于次要目的。自然语言处理可以提供从这些报表中检索的结构化数据。本文总结了基于大语言模型 ()LLM 的放射学报告信息提取 (IE) 方法的研究现状。我们按照 PRISMA-ScR 指南进行范围界定审查。2023 年 8 月 1 日对 5 个数据库进行了查询。在符合纳入标准的 34 项研究中,仅描述了变压器前和基于编码器的模型。外部验证显示性能普遍下降,但LLMs可能会提高 IE 方法的泛化性。与 CT 和 MRI 检查相关的报告以及胸部报告占主导地位。报告的最常见挑战是缺少对外部数据的验证和所描述方法的增强。不同的报告粒度会影响方法的可比性和透明度。