当前位置: X-MOL 学术Med. Image Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rethinking masked image modelling for medical image representation
Medical Image Analysis ( IF 10.7 ) Pub Date : 2024-08-17 , DOI: 10.1016/j.media.2024.103304
Yutong Xie 1 , Lin Gu 2 , Tatsuya Harada 2 , Jianpeng Zhang 3 , Yong Xia 4 , Qi Wu 1
Affiliation  

Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes’ location. Inspired by this, we propose asked mical mage odelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (, cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image–report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image–report pre-training counterparts. Codes are available at .

中文翻译:


重新思考医学图像表示的蒙版图像建模



掩模图像建模(MIM)是一种自我监督学习的形式,通过使用未注释的数据改进图像表示,在计算机视觉领域取得了巨大的成功。传统的 MIM 通常采用对图像进行随机采样的策略。然而,这种随机掩蔽技术可能不太适合医学成像,因为医学成像具有与自然图像不同的独特特征。在医学成像中,特别是在病理学中,与疾病相关的特征通常非常稀疏和局部化,而其余区域则显得正常且未分化。此外,医学图像经常伴随报告,直接查明病理变化的位置。受此启发,我们提出了询问式mical mage odelling(MedIM),据我们所知,这是一种新颖的方法,这是第一个利用放射学报告来指导掩蔽和恢复图像信息区域的研究,鼓励网络探索更强的语义表示来自医学图像。我们引入了两种相互综合的掩蔽策略,知识驱动掩蔽(KDM)和句子驱动掩蔽(SDM)。 KDM 使用放射学报告特有的医学主题标题 (MeSH) 单词来识别映射到 MeSH 单词(心脏、水肿、血管、肺)的症状线索并指导模板生成。认识到放射学报告通常包含几个详细描述不同发现的句子,SDM 整合了句子级信息来识别需要屏蔽的关键区域。 MedIM 通过 KDM 和 SDM 模块的掩蔽信息重建图像,从而促进全面且丰富的医学图像表示。 我们对七个下游任务进行了广泛的实验,涵盖多标签/类图像分类、气胸分割和医学图像报告分析,证明具有报告引导掩蔽的 MedIM 实现了具有竞争力的性能。我们的方法大大优于 ImageNet 预训练、基于 MIM 的预训练和医学图像报告预训练。代码可在 处获得。
更新日期:2024-08-17
down
wechat
bug