Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning,IEEE Transactions on Medical Imaging

当前位置： X-MOL 学术 › IEEE Trans. Med. Imaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning
IEEE Transactions on Medical Imaging ( IF 8.9 ) Pub Date : 2024-03-05 , DOI: 10.1109/tmi.2024.3372638
Aohan Liu ₁ , Yuchen Guo ₂ , Jun-hai Yong ₁ , Feng Xu ₁

Affiliation

The automatic generation of accurate radiology reports is of great clinical importance and has drawn growing research interest. However, it is still a challenging task due to the imbalance between normal and abnormal descriptions and the multi-sentence and multi-topic nature of radiology reports. These features result in significant challenges to generating accurate descriptions for medical images, especially the important abnormal findings. Previous methods to tackle these problems rely heavily on extra manual annotations, which are expensive to acquire. We propose a multi-grained report generation framework incorporating sentence-level image-sentence contrastive learning, which does not require any extra labeling but effectively learns knowledge from the image-report pairs. We first introduce contrastive learning as an auxiliary task for image feature learning. Different from previous contrastive methods, we exploit the multi-topic nature of imaging reports and perform fine-grained contrastive learning by extracting sentence topics and contents and contrasting between sentence contents and refined image contents guided by sentence topics. This forces the model to learn distinct abnormal image features for each specific topic. During generation, we use two decoders to first generate coarse sentence topics and then the fine-grained text of each sentence. We directly supervise the intermediate topics using sentence topics learned by our contrastive objective. This strengthens the generation constraint and enables independent fine-tuning of the decoders using reinforcement learning, which further boosts model performance. Experiments on two large-scale datasets MIMIC-CXR and IU-Xray demonstrate that our approach outperforms existing state-of-the-art methods, evaluated by both language generation metrics and clinical accuracy.

中文翻译：

通过句子级图像语言对比学习生成多粒度放射学报告

自动生成准确的放射学报告具有重要的临床意义，并引起了越来越多的研究兴趣。然而，由于正常和异常描述之间的不平衡以及放射学报告的多句和多主题性质，这仍然是一项具有挑战性的任务。这些特征给生成医学图像的准确描述带来了重大挑战，尤其是重要的异常发现。以前解决这些问题的方法严重依赖额外的手动注释，而获取这些注释的成本很高。我们提出了一种结合句子级图像句子对比学习的多粒度报告生成框架，它不需要任何额外的标签，但可以有效地从图像报告对中学习知识。我们首先引入对比学习作为图像特征学习的辅助任务。与以往的对比方法不同，我们利用成像报告的多主题性质，通过提取句子主题和内容，并将句子内容与句子主题引导的细化图像内容进行对比来进行细粒度的对比学习。这迫使模型学习每个特定主题的明显异常图像特征。在生成过程中，我们使用两个解码器首先生成粗粒度的句子主题，然后生成每个句子的细粒度文本。我们使用对比目标学习的句子主题直接监督中间主题。这增强了生成约束，并能够使用强化学习对解码器进行独立微调，从而进一步提高模型性能。在两个大型数据集 MIMIC-CXR 和 IU-Xray 上进行的实验表明，通过语言生成指标和临床准确性进行评估，我们的方法优于现有的最先进方法。

更新日期：2024-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>