The ability to search for and retrieve important code snippets from huge codebases is a critical task in software engineering. Traditional approaches for retrieving sourcecode rely on keyword-based queries, which mig...
详细信息
ISBN:
(纸本)9798350308266;9798350308259
The ability to search for and retrieve important code snippets from huge codebases is a critical task in software engineering. Traditional approaches for retrieving sourcecode rely on keyword-based queries, which might produce erroneous and partial results. As a result, scientists are investigating the use of deep learning approaches to enhance code search. In this study, a Joint Bi-LSTM-GNN (JBLG) model is proposed for sourcecode search. This model combines the features of Bi-LSTM and GNN. This study also investigates the usage of Large Language Models (LLMs), notably ChatGPT, for source code retrieval in comparison with our proposed JBLG model for sourcecode search. Our results reveal that our proposed JBLG model surpasses standard retrieval approaches and other deep learning models. The JBLG model is tested on codeSearchNet dataset, which comprises query code pairs from open-source projects in a variety of programming languages. Our joint model obtains an impressive mean reciprocal rank (MRR) score, which represents a considerable improvement over the best-performing baseline model. Overall, our findings show that the ChatGPT model does not perform, and this might be due to ChatGPT being a language model developed for natural language processing tasks rather than coderetrieval. To further enhance the effectiveness of the model, future approaches for this study include examining the usage of attention processes and more sophisticated deep learning techniques. The suggested approach might also be expanded to include more programming languages and software engineering functions including code summary and code completion.
Retrieving relevant sourcecode from large repositories is a significant and ongoing challenge in the field of software engineering, primarily due to the vast and ever-expanding amount of available code. Existing deep...
详细信息
Retrieving relevant sourcecode from large repositories is a significant and ongoing challenge in the field of software engineering, primarily due to the vast and ever-expanding amount of available code. Existing deep learning methods, although effective to some extent, exhibit limitations in capturing the intricate and complex structural information embedded within sourcecode, which hinders their ability to provide highly accurate retrieval results. This study endeavors to tackle this prominent issue by introducing a novel and innovative approach known as the Joint Bi-directional LSTM and Graph Neural Networks (JBLG) model for source code retrieval. The central aim is to harness the combined strengths and capabilities of Bi-directional Long ShortTerm Memory (LSTM) networks and Graph Neural Networks (GNNs) to significantly enhance the model's capacity to capture and interpret the complex structural characteristics intrinsic to sourcecode. The proposed JBLG model employs a unique fusion of Bi-directional LSTM, which excels in capturing sequential and temporal dependencies within code, and GNN, which is adept at modeling the intricate graph structure of the code. By leveraging this hybrid architecture, the model aims to provide a comprehensive and highly effective solution for source code retrieval tasks. To assess the efficacy of the JBLG model, extensive experiments are conducted, and the model's performance is evaluated against well -established benchmarks, including LSTM, GNN, and ChatGPT, using two diverse datasets: codeSearchNet and CosBench datasets. These evaluations span multiple programming languages, ensuring a comprehensive and robust assessment of the model's capabilities. The experimental results indicate that the JBLG model consistently outperforms its counterparts, including Bi-LSTM, GNN, ChatGPT, and DGMS, across various evaluation metrics. the JBLG model showcases an exceptional ability to handle and extract the intricate structural information inh
A common practice among programmers is to reuse existing code, accomplished by performing natural language queries through search engines. The main aim of coderetrieval is to search for the most relevant snippet from...
详细信息
A common practice among programmers is to reuse existing code, accomplished by performing natural language queries through search engines. The main aim of coderetrieval is to search for the most relevant snippet from a corpus of code snippets. However, coderetrieval frameworks for low-resource languages are insufficient. Retrieving the most relevant code snippet efficiently can be accomplished only by eliminating the semantic gap between the code snippets residing in the repository and the user's query (natural language description). The primary objective of the research is to contribute to this field by providing a code search framework that can be extended for low-resource languages. The secondary objective is to provide a coderetrieval mechanism that is semantically relevant to the user query and provide programmers with the ability to locate sourcecode that they want to use when developing new applications. The proposed approach is implemented using a web platform to search for sourcecode. As coderetrieval is a sophisticated task, the proposed approach incorporates a semantic search mechanism. This research uses a semantic model for coderetrieval, which generates meanings or synonyms of words. The proposed model integrates ontologies and Natural Language Processing. System performance measures and classification accuracy are computed using precision, recall, and F1-score. We also compare the proposed approach with state-of-the-art baseline models. The retrieved results are ranked, showing that our approach significantly outperforms robust code matching. Our evaluation shows that semantic matching leads to improved source code retrieval. This study marks a substantial advancement in integrating programming expertise with coderetrieval techniques. Moreover, our system lets users know when and how it is used for successful semantic searching.
source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software dev...
详细信息
ISBN:
(纸本)9781538653142
source code retrieval is a task under text retrieval which is performed by software developers regularly. The existing source code retrieval approaches are regular expression based and anticipate that the software developer querying the code base has an extensive acquaintance with the sourcecode. Unlike keyword or regular expression based sourcecode search which are difficult to remember, software developers should be able to query the code base in a sentential form. Although, performance of the search on text widely depends upon query quality, it succeeds when the quality of the textual query is high. Query quality prediction ahead of query execution on a source code retrieval system will save developers time and effort by notifying him/her when a query is unlikely to perform. This paper assesses the performance of prominent classification algorithms namely Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosted Tree (GBT) and Decision Tree (DT) to predict the query quality on a data set created from the documentation of the sourcecode files. Experimental results using benchmark open source projects data set demonstrates that Gradient Boosted Tree performs better than others in comparison.
coderetrieval is to find the code snippet from a large corpus of sourcecode repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques ...
详细信息
coderetrieval is to find the code snippet from a large corpus of sourcecode repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., human natural language) and code snippets (i.e., machine programming language), however, neglecting the deep structured features of query texts and sourcecodes, both of which contain rich semantic information. In this article, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic coderetrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve tile best matching code snippet. In particular, DGMS not only captures more structural information for individual query texts or code snippets, but also learns the fine-grained similarity between them by cross-attention based semantic matching operations. We evaluate the proposed DGMS model on two public coderetrieval datasets with two representative programming languages (i.e., Java and Python). Experiment results demonstrate that DGMS significantly outperforms state-of-the-art baseline models by a large margin on both datasets. Moreover, our extensive ablation studies systematically investigate and illustrate the impact of each part of DGMS.
暂无评论