In recent years, the pre-training, prompting and prediction paradigm, known as prompt-tuning, has achieved significant success in Natural Language Processing (NLP). issue-commit link recovery (ILR) in Software Traceab...
详细信息
In recent years, the pre-training, prompting and prediction paradigm, known as prompt-tuning, has achieved significant success in Natural Language Processing (NLP). issue-commit link recovery (ILR) in Software Traceability (ST) plays an important role in improving the reliability, quality, and security of software systems. The current ILR methods convert the ILR into a classification task using pre-trained language models (PLMs) and dedicated neural networks. These methods do not fully utilize the semantic information embedded in PLMs, failing to achieve acceptable performance. To address this limitation, we introduce a novel paradigm: Multi- template Prompt-tuning with adversarial training for issue-commit link recovery (MPlinker). MPlinker redefines the ILR task as a cloze task via template-based prompt-tuning and incorporates adversarial training to enhance model generalization and reduce overfitting. We evaluated MPlinker on six open-source projects using a comprehensive set of performance metrics. The experiment results demonstrate that MPlinker achieves an average F1-score of 96.10%, Precision of 96.49%, Recall of 95.92%, MCC of 94.04%, AUC of 96.05%, and ACC of 98.15%, significantly outperforming existing state-of-the-art methods. Overall, MPlinker improves the performance and generalization of ILR models and introduces innovative concepts and methods for ILR. The replication package for MPlinker is available at https://***/WTU-intelligent-software-development/ MPlinker.
issue-commitlinks, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to creat...
详细信息
ISBN:
(纸本)9798350329964
issue-commitlinks, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to create tags when making commits. Existing studies have deployed deep learning techniques, including pre-trained models, to improve automatic issue-commit link recovery. Despite their promising performance, we argue that previous approaches have four main problems, hindering them from recovering links in large software projects. To overcome these problems, we propose an efficient and accurate pre-trained framework called EAlink for issue-commit link recovery. EAlink requires much fewer model parameters than existing pre-trained methods, bringing efficient training and recovery. Moreover, we design various techniques to improve the recovery accuracy of EAlink. We construct a large-scale dataset and conduct extensive experiments to demonstrate the power of EAlink. Results show that EAlink outperforms the state-of-the-art methods by a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its training and inference overhead is orders of magnitude lower than existing methods. We provide our implementation and data at https://***/KDEGroup/EAlink.
In the field of software traceability (ST), machine learning (ML) has become a common and effective method for automated issue-commit link recovery. The features extracted from issue and commit artifacts are composed ...
详细信息
issue-commitlinks, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to creat...
详细信息
ISBN:
(纸本)9798350329964
issue-commitlinks, as a type of software traceability links, play a vital role in various software development and maintenance tasks. However, they are typically deficient, as developers often forget or fail to create tags when making commits. Existing studies have deployed deep learning techniques, including pre-trained models, to improve automatic issue-commit link recovery. Despite their promising performance, we argue that previous approaches have four main problems, hindering them from recovering links in large software projects. To overcome these problems, we propose an efficient and accurate pre-trained framework called EAlink for issue-commit link recovery. EAlink requires much fewer model parameters than existing pre-trained methods, bringing efficient training and recovery. Moreover, we design various techniques to improve the recovery accuracy of EAlink. We construct a large-scale dataset and conduct extensive experiments to demonstrate the power of EAlink. Results show that EAlink outperforms the state-of-the-art methods by a large margin (15.23%-408.65%) on various evaluation metrics. Meanwhile, its training and inference overhead is orders of magnitude lower than existing methods. We provide our implementation and data at https://***/KDEGroup/EAlink.
Traceability links between issues and commits (issue-commitlinks recovery (ILR)) play a significant role in software maintenance tasks by enhancing developers' observability in practice. Recent advancements in la...
详细信息
Traceability links between issues and commits (issue-commitlinks recovery (ILR)) play a significant role in software maintenance tasks by enhancing developers' observability in practice. Recent advancements in large language models, particularly pre-trained models, have improved the effectiveness of automated ILR. However, these models' large parameter sizes and extended training time pose challenges in large software projects. Besides, existing methods often overlook the association and distinction among artifacts, leading to the generation of erroneous links. To mitigate these problems, this paper proposes a novel linkrecovery method called MTlink. It utilizes multi-teacher knowledge distillation (MTKD) to compress the model and employs an adaptive multi-task strategy to reduce information loss and improve link accuracy. Experiments are conducted on four open-source projects. The results show that (i) MTlink outperforms state-of-the-art methods;(ii) The multi-teacher knowledge distillation maintains accuracy despite model size reduction;(iii) The adaptive multi-task tracing method effectively handles confusion caused by similar artifacts and balances each task. In conclusion, MTlink offers an efficient solution for ILR in software traceability. The code is available at https://***/records/10321150.
暂无评论