Isolating Compiler Bugs by Generating Effective Witness Programs With Large Language Models,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Isolating Compiler Bugs by Generating Effective Witness Programs With Large Language Models
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 5-7-2024 , DOI: 10.1109/tse.2024.3397822
Haoxin Tu ₁ , Zhide Zhou ₁ , He Jiang ₁ , Imam Nur Bani Yusuf ₂ , Yuxian Li ₂ , Lingxiao Jiang ₂

Affiliation

Compiler bugs pose a significant threat to safety-critical applications, and promptly as well as effectively isolating these bugs is crucial for assuring the quality of compilers. However, the limited availability of debugging information on reported bugs complicates the compiler bug isolation task. Existing compiler bug isolation approaches convert the problem into a test program mutation problem, but they are still limited by ineffective mutation strategies or high human effort requirements. Drawing inspiration from the recent progress of pre-trained Large Language Models (LLMs), such as ChatGPT, in code generation, we propose a new approach named LLM4CBI to utilize LLMs to generate effective test programs for compiler bug isolation. However, using LLMs directly for test program mutation may not yield the desired results due to the challenges associated with formulating precise prompts and selecting specialized prompts. To overcome the challenges, three new components are designed in LLM4CBI . First, LLM4CBI utilizes a program complexity-guided prompt production component, which leverages data and control flow analysis to identify the most valuable variables and locations in programs for mutation. Second, LLM4CBI employs a memorized prompt selection component, which adopts reinforcement learning to select specialized prompts for mutating test programs continuously. Third, a test program validation component is proposed to select specialized feedback prompts to avoid repeating the same mistakes during the mutation process. Compared with the state-of-the-art approaches (DiWi and RecBi) over 120 real bugs from the two most popular compilers, namely GCC and LLVM, our evaluation demonstrates the advantages of LLM4CBI : It can isolate 69.70%/21.74% and 24.44%/8.92% more bugs than DiWi and RecBi within Top-1/Top-5 ranked results. Additionally, we demonstrate that the LLMs component (i.e., GPT-3.5) used in LLM4CBI can be easily replaced by other LLMs while still achieving reasonable results in comparison to related studies.

中文翻译：

通过使用大型语言模型生成有效的见证程序来隔离编译器错误

编译器错误对安全关键型应用程序构成重大威胁，及时有效地隔离这些错误对于确保编译器的质量至关重要。然而，有关所报告错误的调试信息的可用性有限，使编译器错误隔离任务变得复杂。现有的编译器错误隔离方法将问题转化为测试程序变异问题，但它们仍然受到无效变异策略或高人力要求的限制。受预训练大型语言模型（LLMs）（例如 ChatGPT）在代码生成方面的最新进展的启发，我们提出了一种名为 LLM4CBI 的新方法，利用 LLMs 生成有效的代码用于编译器错误隔离的测试程序。然而，由于与制定精确提示和选择专门提示相关的挑战，直接使用 LLMs 进行测试程序突变可能不会产生预期的结果。为了克服这些挑战，LLM4CBI 中设计了三个新组件。首先，LLM4CBI 利用程序复杂性引导的即时生成组件，该组件利用数据和控制流分析来识别程序中最有价值的变量和位置以进行突变。其次，LLM4CBI采用了记忆提示选择组件，该组件采用强化学习来选择专门的提示来不断变异测试程序。第三，提出了一个测试程序验证组件来选择专门的反馈提示，以避免在突变过程中重复相同的错误。与最先进的方法（DiWi 和 RecBi）相比，来自两个最流行的编译器（即 GCC 和 LLVM）的超过 120 个真实错误，我们的评估展示了 LLM4CBI 的优势：它可以隔离 69.70%/21.74% 和 24.44 %/8。在排名前 1/前 5 的结果中，错误比 DiWi 和 RecBi 多 92%。此外，我们证明了 LLM4CBI 中使用的 LLMs 组件（即 GPT-3.5）可以很容易地被其他 LLMs 替代，同时与相关研究相比仍然可以获得合理的结果。

更新日期：2024-08-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>