当前位置: X-MOL 学术Nat. Biomed. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A vision–language foundation model for the generation of realistic chest X-ray images
Nature Biomedical Engineering ( IF 26.8 ) Pub Date : 2024-08-26 , DOI: 10.1038/s41551-024-01246-y
Christian Bluethgen 1, 2, 3 , Pierre Chambon 1, 2 , Jean-Benoit Delbrouck 1, 2 , Rogier van der Sluijs 1, 2 , Małgorzata Połacin 2, 3 , Juan Manuel Zambrano Chaves 1, 4 , Tanishq Mathew Abraham 5, 6 , Shivanshu Purohit 6 , Curtis P Langlotz 1, 2, 4 , Akshay S Chaudhari 1, 2, 4
Affiliation  

The paucity of high-quality medical imaging datasets could be mitigated by machine learning models that generate compositionally diverse images that faithfully represent medical concepts and pathologies. However, large vision–language models are trained on natural images, and the diversity distribution of the generated images substantially differs from that of medical images. Moreover, medical language involves specific and semantically rich vocabulary. Here we describe a domain-adaptation strategy for large vision–language models that overcomes distributional shifts. Specifically, by leveraging publicly available datasets of chest X-ray images and the corresponding radiology reports, we adapted a latent diffusion model pre-trained on pairs of natural images and text descriptors to generate diverse and visually plausible synthetic chest X-ray images (as confirmed by board-certified radiologists) whose appearance can be controlled with free-form medical text prompts. The domain-adaptation strategy for the text-conditioned synthesis of medical images can be used to augment training datasets and is a viable alternative to the sharing of real medical images for model training and fine-tuning.



中文翻译:


用于生成真实胸部 X 射线图像的视觉语言基础模型



高质量医学成像数据集的缺乏可以通过机器学习模型来缓解,这些模型可以生成忠实地代表医学概念和病理的成分多样的图像。然而,大型视觉语言模型是在自然图像上训练的,生成图像的多样性分布与医学图像的多样性分布有很大不同。此外,医学语言涉及特定且语义丰富的词汇。在这里,我们描述了一种克服分布变化的大型视觉语言模型的领域适应策略。具体来说,通过利用公开的胸部 X 射线图像数据集和相应的放射学报告,我们采用了在自然图像和文本描述符对上预先训练的潜在扩散模型,以生成多样化且视觉上合理的合成胸部 X 射线图像(如由委员会认证的放射科医生确认)其外观可以通过自由格式的医疗文本提示进行控制。用于医学图像的文本条件合成的域适应策略可用于增强训练数据集,并且是共享真实医学图像以进行模型训练和微调的可行替代方案。

更新日期:2024-08-26
down
wechat
bug