Machine Learning for Actionable Warning Identification: A Comprehensive Survey,ACM Computing Surveys

当前位置： X-MOL 学术 › ACM Comput. Surv. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Machine Learning for Actionable Warning Identification: A Comprehensive Survey
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2024-09-19 , DOI: 10.1145/3696352
Xiuting Ge, Chunrong Fang, Xuanye Li, Weisong Sun, Daoyuan Wu, Juan Zhai, Shang-Wei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML’s strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers and practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 51 primary studies from 2000/01/01 to 2023/09/01. Then, we outline a typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the key techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI).

中文翻译：

用于可操作警告识别的机器学习：一项全面的调查

可操作警告识别（AWI）在提高静态代码分析器的可用性方面起着至关重要的作用。随着机器学习（ML）的最新进展，已经提出了各种方法将 ML 技术整合到 AWI 中。这些基于 ML 的 AWI 方法受益于 ML 从历史数据中学习微妙和以前未见过的模式的强大能力，已经展示了卓越的性能。然而，缺乏对这些方法的全面概述，这可能会阻碍研究人员和从业者了解当前流程并发现基于 ML 的 AWI 社区未来改进的潜力。在本文中，我们系统地回顾了最先进的基于 ML 的 AWI 方法。首先，我们采用细致的调查方法，收集了 2000 年 1 月 1 日至 2023 年 9 月 1 日的 51 项主要研究。然后，我们概述了一个典型的基于 ML 的 AWI 工作流程，包括警告数据集准备、预处理、AWI 模型构建和评估阶段。在这样的工作流程中，我们根据警告输出格式对基于 ML 的 AWI 方法进行分类。此外，我们还分析了每个阶段使用的关键技术，以及它们的优缺点和分布。最后，我们为未来基于 ML 的 AWI 方法提供了实用的研究方向，重点关注数据改进（例如，增强警告标记策略）和模型探索（例如，探索 AWI 的大型语言模型）等方面。

更新日期：2024-09-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南