检索结果-内蒙古大学图书馆

Amalgamation of Classical and Large Language Models for duplicate bug detection:A Comparative Study

Computers, Materials & Continua 2025年第4期83卷 435-453页

作者： Sai Venkata Akhil Ammu Sukhjit Singh Sehra Sumeet Kaur Sehra Jaiteg Singh Department of Physics and Computer Science Wilfrid Laurier UniversityWaterlooN2L 3C5Canada Appilied Computer Science and Information Technology Conestoga CollegeWaterlooN2J 2W2Canada Chitkara University Institute of Engineering and Technology Chitkara UniversityPunjab140401India

duplicate bug reporting is a critical problem in the software repositories’mining *** bug reports can lead to redundant efforts,wasted resources,and delayed software ***,their accurate identification is essential for streamlining the bug triage process mining *** researchers have explored classical information retrieval,natural language processing,text and data mining,and machine learning *** emergence of large language models(LLMs)(ChatGPT and Huggingface)has presented a new line of models for semantic textual similarity(STS).Although LLMs have shown remarkable advancements,there remains a need for longitudinal studies to determine whether performance improvements are due to the scale of the models or the unique embeddings they produce compared to classical encoding *** study systematically investigates this issue by comparing classical word embedding techniques against LLM-based embeddings for duplicate bug *** this study,we have proposed an amalgamation of models to detect duplicate bug reports using textual and non-textual information about bug *** empirical evaluation has been performed on the open-source datasets and evaluated based on established metrics using the mean reciprocal rank(MRR),mean average precision(MAP),and recall *** experimental results have shown that combined LLMs can outperform(recall-rate@k 68%–74%)other individual=models for duplicate bug *** findings highlight the effectiveness of amalgamating multiple techniques in improving the duplicate bug report detection accuracy.

关键词： duplicate bug detection large language models information retrieval

来源：评论

学校读者我要写书评

暂无评论

Towards Understanding the Impacts of Textual Dissimilarity on duplicate bug Report detection 30

Towards Understanding the Impacts of Textual Dissimilarity o...

引用

30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

作者： Jahan, Sigma Rahman, Mohammad Masudur Dalhousie Univ Halifax NS Canada

ISBN: (纸本)9781665452786

About 40% of software bug reports are duplicates of one another, which pose a major overhead during software maintenance. Traditional techniques often focus on detecting duplicate bug reports that are textually similar. However, in bug tracking systems, many duplicate bug reports might not be textually similar, for which the traditional techniques might fall short. In this paper, we conduct a large-scale empirical study to better understand the impacts of textual dissimilarity on the detection of duplicate bug reports. First, we collect a total of 92,854 bug reports from three open-source systems and construct two datasets containing textually similar and textually dissimilar duplicate bug reports. Then we determine the performance of three existing techniques in detecting duplicate bug reports and show that their performance is significantly poor for textually dissimilar duplicate reports. Second, we analyze the two groups of bug reports using a combination of descriptive analysis, word embedding visualization, and manual analysis. We found that textually dissimilar duplicate bug reports often miss important components (e.g., expected behaviors and steps to reproduce), which could lead to their textual differences and poor performance by the existing techniques. Finally, we apply domain-specific embedding to duplicate bug report detection problems, which shows mixed results. All these findings above warrant further investigation and more effective solutions for detecting textually dissimilar duplicate bug reports.

关键词： Software bug duplicate bug detection textual dissimilarity word embedding t-SNE

来源：评论

学校读者我要写书评

暂无评论

Towards Accurate duplicate bug Retrieval using Deep Learning Techniques

Towards Accurate Duplicate Bug Retrieval using Deep Learning...

引用

33rd IEEE International Conference on Software Maintenance and Evolution (ICSME)

作者： Deshmukh, Jayati Annervaz, K. M. Podder, Sanjay Sengupta, Shubhashis Dubash, Neville Accenture Technol Labs San Jose CA 95113 USA

ISBN: (纸本)9781538609927

duplicate bug detection is the problem of identifying whether a newly reported bug is a duplicate of an existing bug in the system and retrieving the original or similar bugs from the past. This is required to avoid costly rediscovery and redundant work. In typical software projects, the number of duplicate bugs reported may run into the order of thousands, making it expensive in terms of cost and time for manual intervention. This makes the problem of duplicate or similar bug detection an important one in Software Engineering domain. However, an automated solution for the same is not quite accurate yet in practice, in spite of many reported approaches using various machine learning techniques. In this work, we propose a retrieval and classification model using Siamese Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) for accurate detection and retrieval of duplicate and similar bugs. We report an accuracy close to 90% and recall rate close to 80%, which makes possible the practical use of such a system. We describe our model in detail along with related discussions from the Deep Learning domain. By presenting the detailed experimental results, we illustrate the effectiveness of the model in practical systems, including for repositories for which supervised training data is not available.

关键词： Information Retrieval duplicate bug detection Deep Learning Natural Language Processing Word Embeddings Siamese Networks Convolutional Neural Networks Long Short Term Memory

来源：评论

学校读者我要写书评

暂无评论

Detecting duplicate bug Report Using Character N-Gram-Based Features

Detecting Duplicate Bug Report Using Character N-Gram-Based ...

引用

17th Asia Pacific Software Engineering Conference (APSEC)

作者： Sureka, Ashish Jalote, Pankaj Indraprastha Inst Informat Technol New Delhi India

ISBN: (纸本)9780769542669

We present an approach to identify duplicate bug reports expressed in free-form text. duplicate reports needs to be identified to avoid a situation where duplicate reports get assigned to multiple developers. Also, duplicate reports can contain complementary information which can be useful for bug fixing. Automatic identification of duplicate reports (from thousands of existing reports in a bug repository) can increase the productivity of a Triager by reducing the amount of time a Triager spends in searching for duplicate bug reports of any incoming report. The proposed method uses character N-gram-based model for the task of duplicate bug report detection. Previous approaches are word-based whereas this study investigates the usefulness of low-level features based on characters which have certain inherent advantages (such as natural-language independence, robustness towards noisy data and effective handling of domain specific term variations) over word-based features for the problem of duplicate bug report detection. The proposed solution is evaluated on a publicly-available dataset consisting of more than 200 thousand bug reports from the open-source Eclipse project. The dataset consists of ground-truth (pre-annotated dataset having bug reports tagged as duplicate by the Triager). Empirical results and evaluation metrics quantifying retrieval performance indicate that the approach is effective.

关键词： bug Report Analysis duplicate bug detection Text Classification Software Engineering Task Automation Software Testing and Maintenance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：