当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Estimating the Brittleness of AI: Safety Integrity Levels and the Need for Testing Out-Of-Distribution Performance
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-02 , DOI: arxiv-2009.00802 Andrew J. Lohn
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-02 , DOI: arxiv-2009.00802 Andrew J. Lohn
Test, Evaluation, Verification, and Validation (TEVV) for Artificial
Intelligence (AI) is a challenge that threatens to limit the economic and
societal rewards that AI researchers have devoted themselves to producing. A
central task of TEVV for AI is estimating brittleness, where brittleness
implies that the system functions well within some bounds and poorly outside of
those bounds. This paper argues that neither of those criteria are certain of
Deep Neural Networks. First, highly touted AI successes (eg. image
classification and speech recognition) are orders of magnitude more
failure-prone than are typically certified in critical systems even within
design bounds (perfectly in-distribution sampling). Second, performance falls
off only gradually as inputs become further Out-Of-Distribution (OOD). Enhanced
emphasis is needed on designing systems that are resilient despite
failure-prone AI components as well as on evaluating and improving OOD
performance in order to get AI to where it can clear the challenging hurdles of
TEVV and certification.
中文翻译:
估计 AI 的脆弱性:安全完整性级别和测试配电外性能的必要性
人工智能 (AI) 的测试、评估、验证和验证 (TEVV) 是一项挑战,可能会限制人工智能研究人员致力于产生的经济和社会回报。用于 AI 的 TEVV 的一项核心任务是估计脆性,其中脆性意味着系统在某些范围内运行良好,而在这些范围外运行不佳。本文认为,这些标准都不是深度神经网络的确定标准。首先,高度吹捧的 AI 成功(例如图像分类和语音识别)即使在设计范围内(完美分布采样)也比通常在关键系统中认证的失败要多几个数量级。其次,随着输入变得进一步失配 (OOD),性能只会逐渐下降。
更新日期:2020-09-03
中文翻译:
估计 AI 的脆弱性:安全完整性级别和测试配电外性能的必要性
人工智能 (AI) 的测试、评估、验证和验证 (TEVV) 是一项挑战,可能会限制人工智能研究人员致力于产生的经济和社会回报。用于 AI 的 TEVV 的一项核心任务是估计脆性,其中脆性意味着系统在某些范围内运行良好,而在这些范围外运行不佳。本文认为,这些标准都不是深度神经网络的确定标准。首先,高度吹捧的 AI 成功(例如图像分类和语音识别)即使在设计范围内(完美分布采样)也比通常在关键系统中认证的失败要多几个数量级。其次,随着输入变得进一步失配 (OOD),性能只会逐渐下降。