当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Retrieval-Based Neural Code Summarization: A Meta-Learning Approach
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 1-19-2023 , DOI: 10.1109/tse.2023.3238161
Ziyi Zhou 1 , Huiqun Yu 1 , Guisheng Fan 1 , Zijie Huang 1 , Kang Yang 1
Affiliation  

Code summarization aims to generate code summaries automatically, and has attracted a lot of research interest lately. Recent approaches to it commonly adopt neural machine translation techniques, which train a Seq2Seq model on a large corpus and assume it could work on various new code snippets. However, codes are highly varied in practice due to different domains, businesses or programming styles. Therefore, it is challenging to learn such a variety of patterns into a single model. In this paper, we propose a brand-new framework for code summarization based on meta-learning and code retrieval, named MLCS to tackle this issue. In this framework, the summarization of each target code is formalized as a few-shot learning task, where its similar examples are used as training data and the testing example is itself. We retrieve examples similar to the target code in a rank-and-filter manner. Given a neural code summarizer, we optimize it into a meta-learner via Model-Agnostic Meta-Learning (MAML). During inference, the meta-learner first adapts to the retrieved examples and yields an exclusive model for the target code, and then generates its summary. Extensive experiments on real-world datasets show: (1) Utilizing MLCS, a standard Seq2Seq model is able to outperform previous state-of-the-art approaches, including both neural models and retrieval-based neural models; (2) MLCS can flexibly adapt to existing neural code summarizers without modifying their architecture, and could significantly improve their performance with the relative gain of up to 112.7% on BLEU-4, 23.2% on ROUGE-L, and 31.5% on METEOR; (3) Compared to the existing retrieval-based neural approaches, MLCS can better leverage multiple similar examples, and shows better generalization ability on different retrievers, unseen retrieval corpus and low-frequency words.

中文翻译:


走向基于检索的神经代码摘要:一种元学习方法



代码摘要旨在自动生成代码摘要,最近引起了很多研究兴趣。最近的方法通常采用神经机器翻译技术,在大型语料库上训练 Seq2Seq 模型,并假设它可以处理各种新的代码片段。然而,由于不同的领域、业务或编程风格,代码在实践中存在很大差异。因此,将如此多种模式学习到单个模型中是具有挑战性的。在本文中,我们提出了一种基于元学习和代码检索的全新代码摘要框架,称为 MLCS 来解决这个问题。在此框架中,每个目标代码的摘要被形式化为几次学习任务,其中其相似示例用作训练数据,测试示例为其本身。我们以排序和过滤的方式检索与目标代码相似的示例。给定一个神经代码摘要器,我们通过与模型无关的元学习(MAML)将其优化为元学习器。在推理过程中,元学习器首先适应检索到的示例并生成目标代码的专有模型,然后生成其摘要。对真实世界数据集的大量实验表明:(1)利用 MLCS,标准 Seq2Seq 模型能够超越以前最先进的方法,包括神经模型和基于检索的神经模型; (2) MLCS 可以灵活地适应现有的神经代码摘要器,而无需修改其架构,并且可以显着提高其性能,在 BLEU-4 上相对增益高达 112.7%,在 ROUGE-L 上相对增益高达 23.2%,在 31 上相对增益高达 31。METEOR 5%; (3)与现有的基于检索的神经方法相比,MLCS可以更好地利用多个相似的例子,并对不同的检索器、未见过的检索语料库和低频词表现出更好的泛化能力。
更新日期:2024-08-28
down
wechat
bug