Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-02 , DOI: arxiv-2009.00802
Andrew J. Lohn

Test, Evaluation, Verification, and Validation (TEVV) for Artificial Intelligence (AI) is a challenge that threatens to limit the economic and societal rewards that AI researchers have devoted themselves to producing. A central task of TEVV for AI is estimating brittleness, where brittleness implies that the system functions well within some bounds and poorly outside of those bounds. This paper argues that neither of those criteria are certain of Deep Neural Networks. First, highly touted AI successes (eg. image classification and speech recognition) are orders of magnitude more failure-prone than are typically certified in critical systems even within design bounds (perfectly in-distribution sampling). Second, performance falls off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced emphasis is needed on designing systems that are resilient despite failure-prone AI components as well as on evaluating and improving OOD performance in order to get AI to where it can clear the challenging hurdles of TEVV and certification.

中文翻译：

估计 AI 的脆弱性：安全完整性级别和测试配电外性能的必要性

人工智能 (AI) 的测试、评估、验证和验证 (TEVV) 是一项挑战，可能会限制人工智能研究人员致力于产生的经济和社会回报。用于 AI 的 TEVV 的一项核心任务是估计脆性，其中脆性意味着系统在某些范围内运行良好，而在这些范围外运行不佳。本文认为，这些标准都不是深度神经网络的确定标准。首先，高度吹捧的 AI 成功（例如图像分类和语音识别）即使在设计范围内（完美分布采样）也比通常在关键系统中认证的失败要多几个数量级。其次，随着输入变得进一步失配 (OOD)，性能只会逐渐下降。

更新日期：2020-09-03

点击分享查看原文

点击收藏

阅读更多本刊新发论文