As the bug description data generated during the software maintenance cycle, bugreports are usually hastily written by different users, resulting in many redundant and duplicatebugreports (DBRs). Once the DBRs are ...
详细信息
ISBN:
(纸本)9798350324020
As the bug description data generated during the software maintenance cycle, bugreports are usually hastily written by different users, resulting in many redundant and duplicatebugreports (DBRs). Once the DBRs are repeatedly assigned to developers, it will inevitably lead to a serious waste of human resources, especially for large-scale open-source projects. Recently, many experts and scholars have devoted themselves to researching the detection of DBRs and put forward a series of detection methods for DBRs. However, there is still much room for improvement in the performance of DBR prediction. Therefore, this paper proposes a new method for detecting DBR based on technical term extraction, CTEDB (Combination of Term Extraction and DeBERTaV3) for short. This method first extracts technical terms from the text information of bugreports based on Word2Vec and TextRank algorithms. Then it calculates the semantic similarity of technical terms between different bugreports by combining Word2Vec and SBERT models. Finally, it completes the DBR detection task by combining the DeBERTaV3 model. The experimental results show that CTEDB has achieved good results in detecting DBR, and has obviously improved the accuracy, F1-score, recall and precision compared with the baseline approaches.
The bugreports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicatebugreports det...
详细信息
The bugreports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model N-gram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the N-gram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bugreports of the Eclipse project. In the evaluation, we propose a new evaluation-metric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.
The traditional duplicate bug reports detection approaches are usually based on vector space model. However, the experimental result is rarely satisfying since this method cannot distinguish semantic correlation among...
详细信息
暂无评论