A vision–language foundation model for the generation of realistic chest X-ray images,Nature Biomedical Engineering

当前位置： X-MOL 学术 › Nat. Biomed. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A vision–language foundation model for the generation of realistic chest X-ray images
Nature Biomedical Engineering ( IF 26.8 ) Pub Date : 2024-08-26 , DOI: 10.1038/s41551-024-01246-y
Christian Bluethgen _{1,

2,

3} , Pierre Chambon _{1,

2} , Jean-Benoit Delbrouck _{1,

2} , Rogier van der Sluijs _{1,

2} , Małgorzata Połacin _{2,

3} , Juan Manuel Zambrano Chaves _{1,

4} , Tanishq Mathew Abraham _{5,

6} , Shivanshu Purohit ₆ , Curtis P Langlotz _{1,

2,

4} , Akshay S Chaudhari _{1,

2,

4}

Affiliation

The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision–language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision–language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.

中文翻译：

用于生成逼真胸部 X 射线图像的视觉语言基础模型

机器学习模型可以缓解高质量医学成像数据集的缺乏，这些模型可以生成成分多样化的图像，忠实地代表医学概念和病理。然而，大型视觉语言模型是在自然图像上训练的，并且生成图像的多样性分布与医学图像的多样性分布大不相同。此外，医学语言涉及特定且语义丰富的词汇。在这里，我们描述了一种克服分布偏移的大型视觉语言模型的域适应策略。具体来说，通过利用公开可用的胸部 X 射线图像数据集和相应的放射学报告，我们调整了一个在自然图像和文本描述符对上预先训练的潜在扩散模型，以生成多样化且视觉上合理的合成胸部 X 射线图像（由董事会认证的放射科医生确认），其外观可以通过自由格式的医学文本提示来控制。用于医学图像的文本条件合成的域适应策略可用于增强训练数据集，并且是共享真实医学图像以进行模型训练和微调的可行替代方案。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊新发论文