Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Chain-of-Thought in Neural Code Generation: From and for Lightweight Language Models
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2024-08-12 , DOI: 10.1109/tse.2024.3440503
Guang Yang ₁ , Yu Zhou ₁ , Xiang Chen ₂ , Xiangyu Zhang ₁ , Terry Yue Zhuo ₃ , Taolue Chen ₄

Affiliation

Large Language Models (LLMs) have demonstrated remarkable potential in code generation. The integration of Chain of Thought (CoT) reasoning can further boost their performance. However, current CoT methods often require manual writing or LLMs with over 100 billion parameters to generate, impeding their applicability in resource-constrained scenarios. In this study, we investigate lightweight Language Models (

$\ell$

LMs), which are defined to have fewer than 10 billion parameters. Empirically, we find that most

$\ell$

LMs cannot generate high-quality CoTs when prompted by the few-shot method, but can take advantage of high-quality CoTs generated elsewhere to improve their performance in code generation. Based on these findings, we design a novel approach COTTON which can leverage

$\ell$

LMs to automatically generate CoTs for code generation. We synthesize new datasets and conduct extensive experiments on various benchmarks. The results show that the CoTs generated by COTTON outperform the baselines in terms of automated and human evaluation metrics. In particular, the CoTs generated by COTTON boost various

$\ell$

LMs to achieve higher performance gains than those generated by LLMs such as ChatGLM (130B), and are competitive with those generated by Gemini and gpt-3.5-turbo. The results also reveal that COTTON not only improves the performance of

$\ell$

LMs, but also enhances the performance of LLMs. Our study showcases the potential of

$\ell$

LMs in software engineering applications.

中文翻译：

神经代码生成的思想链：来自轻量级语言模型并用于轻量级语言模型

大型语言模型 ( LLMs ) 在代码生成方面表现出了巨大的潜力。思想链（CoT）推理的整合可以进一步提高他们的表现。然而，当前的CoT方法通常需要手动编写或生成超过1000亿个参数的LLMs ，这阻碍了它们在资源有限的场景中的适用性。在这项研究中，我们研究了轻量级语言模型（$\ell$ LM），其定义为具有少于 100 亿个参数。根据经验，我们发现大多数$\ell$ LM在few-shot方法的提示下无法生成高质量的CoT，但可以利用其他地方生成的高质量CoT来提高其代码生成性能。基于这些发现，我们设计了一种新颖的方法 COTTON，它可以利用 $\ell$ LM 自动生成用于代码生成的 CoT。我们综合新的数据集并在各种基准上进行广泛的实验。结果表明，COTTON 生成的 CoT 在自动和人工评估指标方面优于基线。特别是，COTTON 生成的 CoT 增强了各种 $\ell$ LM，以获得比 ChatGLM (130B) 等LLMs生成的更高的性能增益，并且与 Gemini 和 gpt-3.5-turbo 生成的 CoT 具有竞争力。结果还表明，COTTON 不仅提高了 $\ell$ LM 的性能，而且还增强了LLMs的性能。我们的研究展示了 LM 在软件工程应用中的潜力。

更新日期：2024-08-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南