Explaining AI through mechanistic interpretability,European Journal for Philosophy of Science

当前位置： X-MOL 学术 › European Journal for Philosophy of Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Explaining AI through mechanistic interpretability
European Journal for Philosophy of Science ( IF 1.5 ) Pub Date : 2024-10-11 , DOI: 10.1007/s13194-024-00614-4
Lena Kästner, Barnaby Crook

Recent work in explainable artificial intelligence (XAI) attempts to render opaque AI systems understandable through a divide-and-conquer strategy. However, this fails to illuminate how trained AI systems work as a whole. Precisely this kind of functional understanding is needed, though, to satisfy important societal desiderata such as safety. To remedy this situation, we argue, AI researchers should seek mechanistic interpretability, viz. apply coordinated discovery strategies familiar from the life sciences to uncover the functional organisation of complex AI systems. Additionally, theorists should accommodate for the unique costs and benefits of such strategies in their portrayals of XAI research.

中文翻译：

通过机制可解释性解释 AI

可解释人工智能（XAI）的最新工作试图通过分而治之的策略使不透明的 AI 系统易于理解。然而，这并不能阐明训练有素的 AI 系统作为一个整体是如何工作的。然而，恰恰需要这种功能理解来满足重要的社会要求，例如安全。我们认为，为了纠正这种情况，人工智能研究人员应该寻求机制可解释性，即应用生命科学中熟悉的协调发现策略来揭示复杂人工智能系统的功能组织。此外，理论家在描述 XAI 研究时应考虑到此类策略的独特成本和收益。

更新日期：2024-10-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文