International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2024-10-04 , DOI: 10.1007/s11263-024-02222-4 Hongjun Wang, Sagar Vaze, Kai Han
Detecting test-time distribution shift has emerged as a key capability for safely deployed machine learning models, with the question being tackled under various guises in recent years. In this paper, we aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR). In particular, we aim to provide rigorous empirical analysis of different methods across settings and provide actionable takeaways for practitioners and researchers. Concretely, we make the following contributions: (i) We perform rigorous cross-evaluation between state-of-the-art methods in the OOD detection and OSR settings and identify a strong correlation between the performances of methods for them; (ii) We propose a new, large-scale benchmark setting which we suggest better disentangles the problem tackled by OOD detection and OSR, re-evaluating state-of-the-art OOD detection and OSR methods in this setting; (iii) We surprisingly find that the best performing method on standard benchmarks (Outlier Exposure) struggles when tested at scale, while scoring rules which are sensitive to the deep feature magnitude consistently show promise; and (iv) We conduct empirical analysis to explain these phenomena and highlight directions for future research. Code: https://github.com/Visual-AI/Dissect-OOD-OSR
中文翻译:
剖析分布外检测和开放集识别:方法和基准的批判性分析
检测测试时分布偏移已成为安全部署机器学习模型的关键功能,近年来,这个问题以各种形式得到解决。在本文中,我们旨在提供社区中两个最大的子领域的综合视图:分布外 (OOD) 检测和开放集识别 (OSR)。特别是,我们的目标是对不同环境中的不同方法提供严格的实证分析,并为从业者和研究人员提供可操作的收获。具体来说,我们做出了以下贡献:(i) 我们在 OOD 检测和 OSR 设置中对最先进的方法进行了严格的交叉评估,并确定它们的方法性能之间存在很强的相关性;(ii) 我们提出了一个新的、大规模的基准设置,我们建议更好地理清 OOD 检测和 OSR 所解决的问题,在这种情况下重新评估最先进的 OOD 检测和 OSR 方法;(iii) 我们令人惊讶地发现,在标准基准上表现最好的方法(异常值暴露)在大规模测试时表现不佳,而对深度特征量级敏感的评分规则始终显示出希望;(iv) 我们进行实证分析以解释这些现象并突出未来研究的方向。优惠码: https://github.com/Visual-AI/Dissect-OOD-OSR