More than a framework: Sketching out technical enablers for natural language-based source code generation,Computer Science Review

当前位置： X-MOL 学术 › Comput. Sci. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

More than a framework: Sketching out technical enablers for natural language-based source code generation
Computer Science Review ( IF 13.3 ) Pub Date : 2024-05-25 , DOI: 10.1016/j.cosrev.2024.100637
Chen Yang , Yan Liu , Changqing Yin

Natural Language-based Source Code Generation (NLSCG) holds the promise to revolutionize the way how software is developed by means of facilitating a collection of intelligent technical enablers, based on sustained improvements on the natural language to source code pipelines and continuous adoption of new coding paradigms. In recent years, a large variety of NLSCG technical solutions have been proposed, and quite exciting experimental results have been reported. Meanwhile, current researches and initiative application projects in this area reflect a large diversity of NLSCG contexts and of major technical enablers. Such heterogeneity, fragmentation, and vagueness of the NLSCG technical landscape are currently frustrating the full realization of the NLSCG research and application vision. Players in this field could not find systematic guidelines on how to effectively address the ”known unknowns” and how to simply spot the ”unknown unknowns”, which eventually hinder the turning of NLSCG solutions into further research enhancements or production applications. Understanding the context, boundaries, capabilities, and integrations of NLSCG enablers is considered as one of the key drivers for the more practical application of NLSCG models. In this paper, we analyze in detail the natural language to source code pipelines and the evolvement of source code generation tasks, by considering both the problem context and technological aspects. A foresight reference framework for NLSCG is proposed to help handle the source code generation tasks with proper intelligent models. We review the present-day NLSCG technical landscape, as well as the core technical enablers along the source code generation pipelines. Relevant experiments are conducted to validate the role of representative models across different technical enablers on typical datasets, and we finally highlight the contribution of different enablers to code generation capabilities.

中文翻译：

不仅仅是一个框架：勾勒出基于自然语言的源代码生成的技术推动因素

基于自然语言的源代码生成 (NLSCG) 有望通过促进一系列智能技术推动因素，基于自然语言到源代码管道的持续改进和新编码的不断采用，彻底改变软件开发方式范式。近年来，人们提出了各种各样的NLSCG技术解决方案，并报道了相当令人兴奋的实验结果。与此同时，该领域当前的研究和倡议应用项目反映了 NLSCG 背景和主要技术推动因素的多样性。 NLSCG 技术格局的异构性、碎片化和模糊性目前阻碍了 NLSCG 研究和应用愿景的全面实现。该领域的参与者无法找到如何有效解决“已知的未知”以及如何简单地发现“未知的未知”的系统指南，这最终阻碍了NLSCG解决方案转化为进一步的研究增强或生产应用。了解 NLSCG 推动因素的背景、边界、功能和集成被认为是 NLSCG 模型更实际应用的关键驱动因素之一。在本文中，我们考虑问题背景和技术方面，详细分析了自然语言到源代码管道以及源代码生成任务的演变。提出了 NLSCG 的前瞻性参考框架，以帮助使用适当的智能模型处理源代码生成任务。我们回顾了当今 NLSCG 的技术前景，以及源代码生成管道中的核心技术推动因素。通过相关实验来验证典型数据集上不同技术推动因素的代表性模型的作用，最后我们强调了不同推动因素对代码生成能力的贡献。

更新日期：2024-05-25

点击分享查看原文

点击收藏

阅读更多本刊新发论文