当前位置: X-MOL 学术medRxiv. Radiol. Imaging › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Comprehensive Study of GPT-4V's Multimodal Capabilities in Medical Imaging
medRxiv - Radiology and Imaging Pub Date : 2023-11-04 , DOI: 10.1101/2023.11.03.23298067
Yingshu Li , Yunyi Liu , Zhanyu Wang , Xinyu Liang , Lingqiao Liu , Lei Wang , Leyang Cui , Zhaopeng Tu , Longyue Wang , Luping Zhou

This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse medical imaging tasks, including Radiology Report Generation, Medical Visual Question Answering (VQA), and Visual Grounding. While prior efforts have explored GPT-4V's performance in medical imaging, to the best of our knowledge, our study represents the first quantitative evaluation on publicly available benchmarks. Our findings highlight GPT-4V's potential in generating descriptive reports for chest X-ray images, particularly when guided by well-structured prompts. However, its performance on the MIMIC-CXR dataset benchmark reveals areas for improvement in certain evaluation metrics, such as CIDEr. In the domain of Medical VQA, GPT-4V demonstrates proficiency in distinguishing between question types but falls short of prevailing benchmarks in terms of accuracy. Furthermore, our analysis finds the limitations of conventional evaluation metrics like the BLEU score, advocating for the development of more semantically robust assessment methods. In the field of Visual Grounding, GPT-4V exhibits preliminary promise in recognizing bounding boxes, but its precision is lacking, especially in identifying specific medical organs and signs. Our evaluation underscores the significant potential of GPT-4V in the medical imaging domain, while also emphasizing the need for targeted refinements to fully unlock its capabilities.

中文翻译:

GPT-4V 医学成像多模态能力的综合研究

本文对 GPT-4V 在各种医学成像任务中的功能进行了全面评估,包括放射学报告生成、医学视觉问答 (VQA) 和视觉接地。虽然之前的工作已经探索了 GPT-4V 在医学成像方面的性能,但据我们所知,我们的研究代表了对公开基准的首次定量评估。我们的研究结果强调了 GPT-4V 在生成胸部 X 射线图像描述性报告方面的潜力,特别是在结构良好的提示引导下。然而,它在 MIMIC-CXR 数据集基准上的表现揭示了某些评估指标(例如 CIDEr)需要改进的地方。在医学 VQA 领域,GPT-4V 在区分问题类型方面表现出熟练程度,但在准确性方面低于主流基准。此外,我们的分析发现了 BLEU 评分等传统评估指标的局限性,提倡开发语义上更稳健的评估方法。在视觉接地领域,GPT-4V在识别边界框方面​​表现出了初步的前景,但其精度有所欠缺,特别是在识别特定的医疗器官和体征方面。我们的评估强调了 GPT-4V 在医学成像领域的巨大潜力,同时也强调需要进行有针对性的改进以充分释放其功能。
更新日期:2023-11-05
down
wechat
bug