Generative artificial intelligence in mental health care: potential benefits and current challenges,World Psychiatry

当前位置： X-MOL 学术 › World Psychiatry › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generative artificial intelligence in mental health care: potential benefits and current challenges
World Psychiatry ( IF 60.5 ) Pub Date : 2024-01-12 , DOI: 10.1002/wps.21148
John Torous _{1,

2} , Charlotte Blease _{1,

2}

Affiliation

The potential of artificial intelligence (AI) in health care is being intensively discussed, given the easy accessibility of programs such as ChatGPT. While it is usually acknowledged that this technology will never replace clinicians, we should be aware of imminent changes around AI supporting: a) routine office work such as billing, b) clinical documentation, c) medical education, and d) routine monitoring of symptoms. These changes will likely happen rapidly. In summer 2023, the largest electronic medical records provider in the US, Epic Systems, announced that it is partnering with OpenAI to integrate ChatGPT technology¹. The profound impact that these changes will have on the context and delivery of mental health care warrants attention, but often overlooked is the more fundamental question of changes to the nature of mental health care in terms of improving prevention, diagnosis and treatments.

Research on non-clinical samples suggests that AI may augment text-based support programs, but assessments have focused on perceived empathy rather than clinical outcomes. While the former is an important development, it is only the first step towards progressing from feasibility to acceptability and from efficacy to effectiveness. A century of accessible self-help books, more than 60 years of mental health chatbots (Eliza was created in 1959), nearly 30 years of home Internet with access to free online cognitive behavioral therapy and chatrooms, over a decade of smartphone-based mental health apps and text message support programs, and the recent expansion of video-based telehealth, together highlight that access to resources is not a panacea for prevention. The true target for AI preventive programs should not be replicating previous work but rather developing new models able to provide personalized, environmentally and culturally responsive, and scalable support that works effectively for users across all countries and regions.

Computer-based diagnosis programs have existed for decades and have not transformed care. Many studies to date suggest that new AI models can diagnose mental health conditions in the context of standardized exam questions or simple case examples². This is important research, and there is evidence of improvement with new models, but the approach belies the clinical reality of how diagnosis is made or utilized in clinical care. The future of diagnosis in the 21st century can be more inclusive, draw from diverse sources of information, and be outcomes-driven. The true target for AI programs will be to integrate information from clinical exam, patient self-report, digital phenotyping, genetics, neuroimaging, and clinical judgement into novel diagnostic categories that may better reflect the underlying nature of mental illness and offer practical value in guiding effective treatments and cures.

Currently, there is a lack of evidence about how AI programs can guide mental health treatment. Impressive studies show that AI can help select psychiatric medications³, but these studies often rely on complete and labelled data sets, which is not the clinical reality, and lack prospective validation. A recent study in oncology points to an emerging challenge: when ChatGPT 3.5 was asked to provide cancer treatment recommendations, the chatbot was most likely to mix incorrect recommendations with correct ones, making errors difficult to detect even for experts⁴. The true target for AI programs will be in realizing the potential of personalized psychiatry and guiding treatment that will improve outcomes for patients.

For AI to support prevention, diagnosis and treatment there are clear next steps. Utilizing a well-established framework for technology evaluation in mental health, these include advances in equity, privacy, evidence, clinical engagement, and interoperability⁵.

Since current datasets used in AI models are trained on non-psychiatric sources, today all major AI chatbots clearly state that their products must not be used for clinical purposes. Even with proper training, risks of AI bias must be carefully explored, given numerous recent examples of clear harm in other medical fields⁶. A rapid glance at images generated by an AI program when asked to draw “schizophrenia”⁷ visualized the extent to which extreme stigma and harmful bias have informed what current AI models conceptualize as mental illness.

A second area of focus is privacy, with current AI chatbots unable to protect personal health information. Large language models are trained on data scraped from the Internet which may encompass sensitive personal health information. The European Union is exploring whether OpenAI's ChatGPT complies with the General Data Protection Regulation's requirement that informed consent or strong public health justifications are met to process sensitive information. In the US, privacy issues emerge with the risk that clinicians may input sensitive patient data into chatbots. This problem caused the American Psychiatric Association to release an advisory in summer 2023 noting that clinicians should not enter any patient information into any AI chatbot⁸. In order to allow integration into health care, authorities will need to determine whether chatbots meet privacy regulations.

A third focus is the next generation of evidence, as current studies that suggest the ability of chatbots to perform binary classification of diagnosis (e.g., presence of any depression or none) offer limited practical clinical value. The potential to offer differential diagnosis based on multimodal data sources (e.g., medical records, genetic results, neuroimaging data) remains appealing but as yet untested. Evidence of the true potential for supporting care remains elusive, and the harm caused to the eating disorder community by the public release (and rapid repudiation within one week) of the Tessa chatbot highlights that more robust evidence is necessary than that currently collected⁹. Like other medical devices, evidence of clinical claims should be supported by high-quality randomized controlled trials that employ digital placebo groups (e.g., a non-therapeutic chatbot).

Fourth, a focus on engagement is critical. We already know that engagement with mental health apps has been minimal, and can learn from those experiences. We are aware that engagement is not only a patient challenge, as clinician uptake of this technology is also a widely cited barrier and will require careful attention to implementation frameworks. These consistently highlight that, while innovation is important, there must be a concomitant focus on the recipients (i.e., education and training for both patients and clinicians) as well as on the context of care (e.g., regulation, reimbursement, clinical workflow). The principles of the non-adoption, abandonment, scale-up, spread and sustainability (NASSS) framework remain relevant in AI and offer tangible targets for avoiding failure.

Fifth and related, AI models need to be well integrated into the health care system. The era of standalone or self-help programs is rapidly ending, with the realization that such tools may often fragment care, cannot scale, and are rarely sustainable. This requires, in addition to data interoperability, careful designing of how AI interacts with all aspects of the health care system. There is a need for collaboration not only with clinicians but also with patients, family members, administrators, regulators, and of course AI developers.

While generative AI technologies continue to evolve, the clinical community today has the opportunity to evolve as well. Clinicians do not need to become experts in generative AI, but a new focus on education about current capabilities, risks and benefits can be a tangible first step towards more informed decision-making around what role these technologies can and should play in care.

中文翻译：

精神卫生保健中的生成人工智能：潜在好处和当前挑战

鉴于 ChatGPT 等程序易于访问，人工智能 (AI) 在医疗保健领域的潜力正在得到深入讨论。虽然人们通常认为这项技术永远不会取代临床医生，但我们应该意识到人工智能支持方面即将发生的变化：a) 日常办公室工作，如计费，b) 临床文档，c) 医学教育，d) 症状的日常监测。这些变化可能会迅速发生。 2023 年夏季，美国最大的电子病历提供商 Epic Systems 宣布与 OpenAI 合作集成 ChatGPT 技术¹ 。这些变化将对精神卫生保健的背景和提供产生深远的影响，值得关注，但经常忽视的是更根本的问题，即在改善预防、诊断和治疗方面改变精神卫生保健的性质。

对非临床样本的研究表明人工智能可能会增强基于文本的支持计划，但评估的重点是感知的同理心而不是临床结果。虽然前者是一个重要的发展，但这只是从可行性到可接受性、从有效性到有效性的第一步。一个世纪的无障碍自助书籍、60 多年的心理健康聊天机器人（Eliza 于 1959 年创建）、近 30 年的家庭互联网（可访问免费的在线认知行为治疗和聊天室）、十多年的基于智能手机的心理咨询健康应用程序和短信支持计划，以及最近基于视频的远程医疗的扩展，共同强调了获取资源并不是预防的万能药。人工智能预防计划的真正目标不应该是重复以前的工作，而是开发新的模型，能够提供个性化、环境和文化响应、可扩展的支持，为所有国家和地区的用户有效服务。

基于计算机的诊断程序已经存在了几十年，但并没有改变医疗保健。迄今为止的许多研究表明，新的人工智能模型可以在标准化考试问题或简单案例的背景下诊断心理健康状况² 。这是一项重要的研究，并且有证据表明新模型有所改进，但该方法掩盖了临床护理中如何进行或使用诊断的临床现实。 21 世纪诊断的未来可以更具包容性，可以从不同的信息来源中获取信息，并且以结果为导向。人工智能项目的真正目标是将临床检查、患者自我报告、数字表型、遗传学、神经影像学和临床判断的信息整合到新的诊断类别中，从而更好地反映精神疾病的根本性质，并为指导提供实用价值。有效的治疗和治愈。

目前，缺乏关于人工智能程序如何指导心理健康治疗的证据。令人印象深刻的研究表明，人工智能可以帮助选择精神科药物³ ，但这些研究通常依赖于完整且标记的数据集，这不是临床现实，并且缺乏前瞻性验证。最近的一项肿瘤学研究指出了一个新出现的挑战：当 ChatGPT 3.5 被要求提供癌症治疗建议时，聊天机器人最有可能将错误的建议与正确的建议混合在一起，从而导致即使专家也难以发现错误⁴ 。人工智能项目的真正目标是实现个性化精神病学的潜力并指导治疗以改善患者的治疗结果。

对于人工智能支持预防、诊断和治疗来说，有明确的后续步骤。利用完善的心理健康技术评估框架，其中包括公平、隐私、证据、临床参与和互操作性方面的进步⁵ 。

由于当前人工智能模型中使用的数据集是在非精神病学来源上进行训练的，因此如今所有主要的人工智能聊天机器人都明确声明其产品不得用于临床目的。即使经过适当的培训，也必须仔细探讨人工智能偏见的风险，因为最近有许多其他医学领域明显危害的例子⁶ 。当被要求画出“精神分裂症”时，快速浏览一下人工智能程序生成的图像⁷ ，可以直观地看到极端耻辱和有害偏见在多大程度上影响了当前人工智能模型将其概念化为精神疾病的程度。

第二个关注领域是隐私，当前的人工智能聊天机器人无法保护个人健康信息。大型语言模型是根据从互联网上抓取的数据进行训练的，这些数据可能包含敏感的个人健康信息。欧盟正在探索 OpenAI 的 ChatGPT 是否符合《通用数据保护条例》的要求，即在处理敏感信息时必须满足知情同意或强有力的公共卫生理由。在美国，隐私问题随着临床医生可能将敏感患者数据输入聊天机器人的风险而出现。这个问题导致美国精神病学协会在 2023 年夏季发布了一份建议，指出临床医生不应将任何患者信息输入任何人工智能聊天机器人⁸ 。为了融入医疗保健领域，当局需要确定聊天机器人是否符合隐私法规。

第三个焦点是下一代证据，因为当前的研究表明聊天机器人执行二元诊断分类的能力（例如，是否存在任何抑郁症）提供的实际临床价值有限。基于多模式数据源（例如病历、遗传结果、神经影像数据）提供鉴别诊断的潜力仍然很有吸引力，但尚未经过测试。支持护理的真正潜力的证据仍然难以捉摸，而 Tessa 聊天机器人的公开发布（以及一周内的迅速否认）对饮食失调社区造成的伤害凸显出需要比目前收集的更强有力的证据⁹ 。与其他医疗设备一样，临床声明的证据应得到采用数字安慰剂组（例如非治疗性聊天机器人）的高质量随机对照试验的支持。

第四，注重参与度至关重要。我们已经知道，人们对心理健康应用程序的参与度很低，我们可以从这些经验中学习。我们意识到，参与不仅是患者的挑战，因为临床医生对这项技术的采用也是一个被广泛引用的障碍，需要仔细关注实施框架。这些始终强调，虽然创新很重要，但必须同时关注接受者（即对患者和临床医生的教育和培训）以及护理环境（例如监管、报销、临床工作流程）。不采用、放弃、扩大规模、传播和可持续性（NASSS）框架的原则在人工智能中仍然具有重要意义，并为避免失败提供了切实的目标。

第五，相关的，人工智能模型需要很好地融入医疗保健系统。独立或自助计划的时代正在迅速结束，人们意识到这些工具往往可能会分散护理、无法扩展且很少可持续。除了数据互操作性之外，这还需要仔细设计人工智能如何与医疗保健系统的各个方面交互。不仅需要与临床医生合作，还需要与患者、家庭成员、管理人员、监管机构，当然还有人工智能开发人员合作。

尽管生成式人工智能技术不断发展，当今的临床社区也有机会发展。临床医生不需要成为生成人工智能方面的专家，但对当前能力、风险和收益的教育的新关注可能是迈向更明智决策的切实第一步，围绕这些技术可以并且应该在护理中发挥什么作用。

更新日期：2024-01-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南