当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Multimodal Generative AI Copilot for Human Pathology
Nature ( IF 50.5 ) Pub Date : 2024-06-12 , DOI: 10.1038/s41586-024-07618-3
Ming Y. Lu , Bowen Chen , Drew F. K. Williamson , Richard J. Chen , Melissa Zhao , Aaron K. Chow , Kenji Ikemura , Ahrong Kim , Dimitra Pouli , Ankush Patel , Amr Soliman , Chengkuan Chen , Tong Ding , Judy J. Wang , Georg Gerber , Ivy Liang , Long Phi Le , Anil V. Parwani , Luca L. Weishaupt , Faisal Mahmood

The field of computational pathology[1,2] has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders[3,4]. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants and copilots[5] tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We build PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and finetuning the whole system on over 456,000 diverse visual language instructions consisting of 999,202 question-answer turns. We compare PathChat against several multimodal vision language AI assistants and GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4[7]. PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases of diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision-language AI Copilot that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.



中文翻译:


人类病理学的多模态生成人工智能副驾驶



计算病理学领域[1,2]在任务特定预测模型和任务无关的自监督视觉编码器[3,4]的开发方面取得了显着进展。然而,尽管生成人工智能 (AI) 呈爆炸式增长,但针对病理学构建通用、多模式 AI 助手和副驾驶 [5] 的研究仍然有限。在这里,我们介绍 PathChat,一种用于人类病理学的视觉语言多面手人工智能助手。我们通过采用病理学基础视觉编码器,将其与预训练的大型语言模型相结合,并根据超过 456,000 种不同的视觉语言指令(包括 999,202 个问答轮)对整个系统进行微调来构建 PathChat。我们将 PathChat 与几种多模态视觉语言 AI ​​助手和 GPT4V 进行比较,GPT4V 为商用多模态通用 AI 助手 ChatGPT-4[7] 提供支持。 PathChat 在来自不同组织来源和疾病模型的病例的多项选择诊断问题上取得了最先进的性能。此外,通过使用开放式问题和人类专家评估,我们发现整体 PathChat 对与病理学相关的各种查询产生了更准确且更适合病理学家的响应。作为一种交互式通用视觉语言 AI ​​Copilot,可以灵活处理视觉和自然语言输入,PathChat 可以在病理学教育、研究和人机交互临床决策中找到有影响力的应用。

更新日期:2024-06-13
down
wechat
bug