Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2024-08-27 , DOI: 10.1109/tse.2024.3449917
Weifeng Sun ₁ , Zhenting Guo ₁ , Meng Yan ₁ , Zhongxin Liu ₂ , Yan Lei ₁ , Hongyu Zhang ₁

Affiliation

Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named TestLinker . For the first challenge of existing static approaches, TestLinker introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the semantic correlation learning , which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). TestLinker further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that TestLinker significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, TestLinker , which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.

中文翻译：

基于语义关联学习的方法级测试到代码可追溯性链接构建

测试到代码可追溯性链接（TCTL）在测试工件和代码工件之间建立链接。这些链接使开发人员和测试人员能够快速识别特定测试用例测试的特定代码段，从而促进更高效的调试、回归测试和维护活动。已经提出了基于不同概念的各种方法来建立方法级 TCTL，特别是将单元测试与相应的焦点方法联系起来。静态方法（例如基于命名约定的方法）使用基于启发式和相似性的策略。但是，此类方法面临以下挑战：（1）开发人员受特定场景和开发需求的驱动，可能会偏离命名约定，导致 TCTL 识别失败。（2）静态方法经常忽略测试中嵌入的丰富语义，导致测试与语义上不相关的代码片段之间存在错误的关联。尽管动态方法取得了有希望的结果，但它们要求项目是可编译的，测试是可执行的，这限制了它们的可用性。对于需要大量测试代码对的下游任务，此限制非常重要，因为并非所有项目都能满足这些要求。为了解决上述限制，我们提出了一种新的静态方法级 TCTL 方法，名为 TestLinker 。对于现有静态方法的第一个挑战，TestLinker 引入了一个两阶段 TCTL 框架，以分类方式适应不同的项目类型。至于第二个挑战，我们采用了语义关联学习，它基于预训练代码模型（PCM）学习并建立测试和焦点方法之间的语义关联。 TestLinker 进一步建立了映射规则，以准确地将推荐的函数名称与具体的生产函数声明联系起来。对精心标记的数据集的实证评估表明，TestLinker 的性能明显优于传统的静态技术，平均 F1 分数提高从 73.48% 到 202.00% 不等。此外，与最先进的动态方法相比，仅利用静态信息的 TestLinker 表现出相当甚至更好的性能，平均 F1 分数提高了 37.40%。

更新日期：2024-08-27

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南