Enhancing the vision–language foundation model with key semantic knowledge-emphasized report refinement,Medical Image Analysis

当前位置： X-MOL 学术 › Med. Image Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing the vision–language foundation model with key semantic knowledge-emphasized report refinement
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-08-13 , DOI: 10.1016/j.media.2024.103299
Weijian Huang ₁ , Cheng Li ₂ , Hao Yang ₁ , Jiarun Liu ₁ , Yong Liang ₃ , Hairong Zheng ₂ , Shanshan Wang ₂

Affiliation

Recently, vision–language representation learning has made remarkable advancements in building up medical foundation models, holding immense potential for transforming the landscape of clinical research and medical care. The underlying hypothesis is that the rich knowledge embedded in radiology reports can effectively assist and guide the learning process, reducing the need for additional labels. However, these reports tend to be complex and sometimes even consist of redundant descriptions that make the representation learning too challenging to capture the key semantic information. This paper develops a novel iterative vision–language representation learning framework by proposing a key semantic knowledge-emphasized report refinement method. Particularly, raw radiology reports are refined to highlight the key information according to a constructed clinical dictionary and two model-optimized knowledge-enhancement metrics. The iterative framework is designed to progressively learn, starting from gaining a general understanding of the patient’s condition based on raw reports and gradually refines and extracts critical information essential to the fine-grained analysis tasks. The effectiveness of the proposed framework is validated on various downstream medical image analysis tasks, including disease classification, region-of-interest segmentation, and phrase grounding. Our framework surpasses seven state-of-the-art methods in both fine-tuning and zero-shot settings, demonstrating its encouraging potential for different clinical applications.

中文翻译：

通过关键语义知识强调报告细化来增强视觉-语言基础模型

最近，视觉语言表征学习在建立医学基础模型方面取得了显着进展，在改变临床研究和医疗保健领域具有巨大潜力。基本假设是，放射学报告中嵌入的丰富知识可以有效地协助和指导学习过程，减少对额外标签的需求。然而，这些报告往往很复杂，有时甚至包含冗余描述，这使得表示学习难以捕获关键语义信息。本文通过提出一种强调关键语义知识的报告细化方法，开发了一种新颖的迭代视觉-语言表示学习框架。特别是，根据构建的临床词典和两个模型优化的知识增强指标，对原始放射学报告进行细化以突出显示关键信息。迭代框架旨在逐步学习，从基于原始报告对患者病情的总体了解开始，逐步细化和提取对细粒度分析任务至关重要的关键信息。所提出的框架的有效性在各种下游医学图像分析任务上得到了验证，包括疾病分类、感兴趣区域分割和短语基础。我们的框架在微调和零样本设置方面都超过了七种最先进的方法，展示了其在不同临床应用中令人鼓舞的潜力。

更新日期：2024-08-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南