当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fusing Code Searchers
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 5-20-2024 , DOI: 10.1109/tse.2024.3403042
Shangwen Wang 1 , Mingyang Geng 1 , Bo Lin 1 , Zhensu Sun 2 , Ming Wen 3 , Yepang Liu 4 , Li Li 5 , Tegawendé F. Bissyandé 6 , Xiaoguang Mao 1
Affiliation  

Code search, which consists in retrieving relevant code snippets from a codebase based on a given query, provides developers with useful references during software development. Over the years, techniques alternatively adopting different mechanisms to compute the relevance score between a query and a code snippet have been proposed to advance the state of the art in this domain, including those relying on information retrieval, supervised learning, and pre-training. Despite that, the usefulness of existing techniques is still compromised since they cannot effectively handle all the diversified queries and code in practice. To tackle this challenge, we present Dancer , a data fusion based code searcher. Our intuition (also the basic hypothesis of this study) is that existing techniques may complement each other because of the intrinsic differences in their working mechanisms. We have validated this hypothesis via an exploratory study. Based on that, we propose to fuse the results generated by different code search techniques so that the advantage of each standalone technique can be fully leveraged. Specifically, we treat each technique as a retrieval system and leverage well-known data fusion approaches to aggregate the results from different systems. We evaluate six existing code search techniques on two large-scale datasets, and exploit eight classic data fusion approaches to incorporate their results. Our experiments show that the best fusion approach is able to outperform the standalone techniques by 35% - 550% and 65% - 825% in terms of MRR (mean reciprocal rank) on the two datasets, respectively.

中文翻译:

 融合代码搜索器


代码搜索包括根据给定查询从代码库中检索相关代码片段,为开发人员在软件开发过程中提供有用的参考。多年来,人们提出了交替采用不同机制来计算查询和代码片段之间的相关性得分的技术,以推进该领域的最新技术,包括那些依赖于信息检索、监督学习和预训练的技术。尽管如此,现有技术的实用性仍然受到损害,因为它们无法有效地处理实践中的所有多样化查询和代码。为了应对这一挑战,我们推出了 Dancer,一个基于数据融合的代码搜索器。我们的直觉(也是本研究的基本假设)是,现有技术可能会相互补充,因为它们的工作机制存在内在差异。我们通过探索性研究验证了这一假设。基于此,我们建议融合不同代码搜索技术生成的结果,以便充分利用每种独立技术的优势。具体来说,我们将每种技术视为检索系统,并利用众所周知的数据融合方法来聚合来自不同系统的结果。我们在两个大型数据集上评估了六种现有的代码搜索技术,并利用八种经典的数据融合方法来合并其结果。我们的实验表明,就两个数据集的 MRR(平均倒数排名)而言,最佳融合方法能够分别优于独立技术 35% - 550% 和 65% - 825%。
更新日期:2024-08-19
down
wechat
bug