Detecting hallucinations in large language models using semantic entropy,Nature

当前位置： X-MOL 学术 › Nature › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting hallucinations in large language models using semantic entropy
Nature ( IF 50.5 ) Pub Date : 2024-06-19 , DOI: 10.1038/s41586-024-07421-0
Sebastian Farquhar , Jannik Kossen , Lorenz Kuhn , Yarin Gal

Large language model (LLM) systems, such as ChatGPT¹ or Gemini², can show impressive reasoning and question-answering capabilities but often ‘hallucinate’ false outputs and unsubstantiated answers^3,4. Answering unreliably or without the necessary information prevents adoption in diverse fields, with problems including fabrication of legal precedents⁵ or untrue facts in news articles⁶ and even posing a risk to human life in medical domains such as radiology⁷. Encouraging truthfulness through supervision or reinforcement has been only partially successful⁸. Researchers need a general method for detecting hallucinations in LLMs that works even with new and unseen questions to which humans might not know the answer. Here we develop new methods grounded in statistics, proposing entropy-based uncertainty estimators for LLMs to detect a subset of hallucinations—confabulations—which are arbitrary and incorrect generations. Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words. Our method works across datasets and tasks without a priori knowledge of the task, requires no task-specific data and robustly generalizes to new tasks not seen before. By detecting when a prompt is likely to produce a confabulation, our method helps users understand when they must take extra care with LLMs and opens up new possibilities for using LLMs that are otherwise prevented by their unreliability.

中文翻译：

使用语义熵检测大型语言模型中的幻觉

大型语言模型 (LLM) 系统，例如 ChatGPT ¹ 或 Gemini ² ，可以表现出令人印象深刻的推理和问答能力，但经常会“产生幻觉”错误输出和未经证实的答案 ^3,4 。回答不可靠或没有必要的信息会阻碍不同领域的采用，问题包括捏造法律先例 ⁵ 或新闻文章中的不真实事实 ⁶ ，甚至在医疗领域对人的生命构成风险例如放射学 ⁷ 等领域。通过监督或强化来鼓励诚实仅取得了部分成功 ⁸ 。研究人员需要一种通用方法来检测 LLMs 中的幻觉，即使是人类可能不知道答案的新的、未见过的问题也能发挥作用。在这里，我们开发了基于统计学的新方法，为 LLMs 提出了基于熵的不确定性估计器，以检测幻觉的子集（虚构），这是任意且不正确的生成。我们的方法解决了这样一个事实：一个想法可以通过计算含义层面的不确定性而不是特定的单词序列来以多种方式表达。我们的方法可以跨数据集和任务工作，无需先验了解任务，不需要特定于任务的数据，并且可以稳健地推广到以前从未见过的新任务。通过检测提示何时可能产生闲聊，我们的方法可以帮助用户了解何时必须格外小心 LLMs，并为使用 LLMs 开辟了新的可能性，否则这些可能性会被阻止他们的不可靠。

更新日期：2024-06-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>