npj Digital Medicine ( IF 12.4 ) Pub Date : 2024-08-24 , DOI: 10.1038/s41746-024-01219-0 Daniel Reichenpfader 1, 2 , Henning Müller 3, 4 , Kerstin Denecke 1
Radiological imaging is a globally prevalent diagnostic method, yet the free text contained in radiology reports is not frequently used for secondary purposes. Natural Language Processing can provide structured data retrieved from these reports. This paper provides a summary of the current state of research on Large Language Model (LLM) based approaches for information extraction (IE) from radiology reports. We conduct a scoping review that follows the PRISMA-ScR guideline. Queries of five databases were conducted on August 1st 2023. Among the 34 studies that met inclusion criteria, only pre-transformer and encoder-based models are described. External validation shows a general performance decrease, although LLMs might improve generalizability of IE approaches. Reports related to CT and MRI examinations, as well as thoracic reports, prevail. Most common challenges reported are missing validation on external data and augmentation of the described methods. Different reporting granularities affect the comparability and transparency of approaches.
中文翻译:
基于大语言模型的放射学报告信息提取方法的范围审查
放射成像是一种全球流行的诊断方法,但放射学报告中包含的自由文本并不经常用于次要目的。自然语言处理可以提供从这些报告中检索到的结构化数据。本文总结了大语言模型的研究现状(LLM )基于从放射学报告中提取信息(IE)的方法。我们按照 PRISMA-ScR 指南进行范围界定审查。 2023年8月1日对五个数据库进行了查询。在符合纳入标准的34项研究中,仅描述了基于预变换器和编码器的模型。外部验证显示总体性能下降,尽管LLMs可能会提高 IE 方法的通用性。以CT、MRI检查相关报告以及胸部报告为准。报告的最常见挑战是缺少对外部数据的验证和所描述方法的增强。不同的报告粒度会影响方法的可比性和透明度。