检索结果-内蒙古大学图书馆

29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

作者： Gong, Zi Gao, Cuiyun Wang, Yasheng Gu, Wenchao Peng, Yun Xu, Zenglin Harbin Inst Technol Shenzhen Dept Comp Sci & Technol Shenzhen Peoples R China Huawei Technol Noahs Ark Lab Shenzhen Peoples R China Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China

ISBN: (纸本)9781665437868

source code summarization aims at generating concise and clear natural language descriptions for programming languages. Well-written code summaries are beneficial for programmers to participate in the software development and maintenance process. To learn the semantic representations of source code, recent efforts focus on incorporating the syntax structure of code into neural networks such as Transformer. Such Transformer-based approaches can better capture the long-range dependencies than other neural networks including Recurrent Neural Networks (RNNs), however, most of them do not consider the structural relative correlations between tokens, e.g., relative positions in Abstract Syntax Trees (ASTs), which is beneficial for code semantics learning. To model the structural dependency, we propose a StruCtural RelatIve Position guided Transformer, named SCRIPT. SCRIPT first obtains the structural relative positions between tokens via parsing the ASTs of source code, and then passes them into two types of Transformer encoders. One Transformer directly adjusts the input according to the structural relative distance;and the other Transformer encodes the structural relative positions during computing the self-attention scores. Finally, we stack these two types of Transformer encoders to learn representations of source code. Experimental results show that the proposed SCRIPT outperforms the state-of-the-art methods by at least 1.6%, 1.4% and 2.8% with respect to BLEU, ROUGEL and METEOR on benchmark datasets, respectively. We further show that how the proposed SCRIPT captures the structural relative dependencies.

关键词： neural networks AST Transformer source code summarization relative positional encoding AI in SE

来源：评论

学校读者我要写书评

暂无评论

source code summarization Using Attention-based Keyword Memory Networks

Source Code Summarization Using Attention-based Keyword Memo...

引用

IEEE International Conference on Big Data and Smart Computing (BigComp)

作者： Choi, YunSeok Kim, Suah Lee, Jee-Hyong Sungkyunkwan Univ Dept Software Suwon South Korea Sungkyunkwan Univ Dept Elect & Comp Engn Suwon South Korea

ISBN: (纸本)9781728160344

Recently, deep learning techniques have been developed for source code summarization. Most existing studies have simply adopted natural language processing techniques, because source code summarization can be considered as machine translation tasks from source code into descriptions. However, source code and its description are very different, not only in the languages of writing, but also in the purpose of writing. There is a large semantic gap between source codes in programming languages and their descriptions in natural languages. To respond to the semantic gap, we propose a two-phase model that consists of a keyword predictor and a description generator. The keyword predictor captures the natural language keywords semantically associated with the source code, and the generator generates a description by referring to the natural language keywords provided by the predictor. Using such keywords as scaffolding, we can effectively reduce the semantic gap and generate more accurate descriptions of source codes. To evaluate the proposed method, we use datasets collected from GitHub and StackOverflow. We perform various experiments with these datasets. Our methods show outstanding performance compared with baselines that include state-of-the-art methods, which concludes that keyword prediction is very helpful to the generation of accurate descriptions.

关键词： source code summarization Keyword Memory Networks

来源：评论

学校读者我要写书评

暂无评论

Esale: Enhancing code-Summary Alignment Learning for source code summarization

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2024年第8期50卷 2077-2095页

作者： Fang, Chunrong Sun, Weisong Chen, Yuchen Chen, Xiao Wei, Zhao Zhang, Quanjun You, Yudu Luo, Bin Liu, Yang Chen, Zhenyu Nanjing Univ State Key Lab Novel Software Technol Nanjing 210093 Peoples R China Nanjing Univ Software Inst Nanjing 210008 Jiangsu Peoples R China Nanyang Technol Univ Coll Comp & Data Sci Singapore 639798 Singapore Tencent Inc Shenzhen 518057 Peoples R China

(source) code summarization aims to automatically generate succinct natural language summaries for given code snippets. Such summaries play a significant role in promoting developers to understand and maintain code. Inspired by neural machine translation, deep learning-based code summarization techniques widely adopt an encoder-decoder framework, where the encoder transforms given code snippets into context vectors, and the decoder decodes context vectors into summaries. Recently, large-scale pre-trained models for source code (e.g., codeBERT and UniXcoder) are equipped with encoders capable of producing general context vectors and have achieved substantial improvements on the code summarization task. However, although they are usually trained mainly on code-focused tasks and can capture general code features, they still fall short in capturing specific features that need to be summarized. In a nutshell, they fail to learn the alignment between code snippets and summaries (code-summary alignment for short). In this paper, we propose a novel approach to improve code summarization based on summary-focused tasks. Specifically, we exploit a multi-task learning paradigm to train the encoder on three summary-focused tasks to enhance its ability to learn code-summary alignment, including unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike pre-trained models that mainly predict masked tokens in code snippets, we design ULM and MLM to predict masked words in summaries. Intuitively, predicting words based on given code snippets would help learn the code-summary alignment. In addition, existing work shows that AWP affects the prediction of the entire summary. Therefore, we further introduce the domain-specific task AWP to enhance the ability of the encoder to learn the alignment between action words and code snippets. We evaluate the effectiveness of our approach, called Esale, by conducting extensive experiments on

关键词： source code summarization deep learning multi-task learning source code summarization deep learning multi-task learning

来源：评论

学校读者我要写书评

暂无评论

Automatic source code summarization of Context for Java Methods

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2016年第2期42卷 103-119页

作者： McBurney, Paul W. McMillan, Collin Univ Notre Dame Coll Comp Sci & Engn Notre Dame IN 46556 USA Univ Notre Dame Comp Sci South Bend VA USA

source code summarization is the task of creating readable summaries that describe the functionality of software. source code summarization is a critical component of documentation generation, for example as Javadocs formed from short paragraphs attached to each method in a Java program. At present, a majority of source code summarization is manual, in that the paragraphs are written by human experts. However, new automated technologies are becoming feasible. These automated techniques have been shown to be effective in select situations, though a key weakness is that they do not explain the source code's context. That is, they can describe the behavior of a Java method, but not why the method exists or what role it plays in the software. In this paper, we propose a source code summarization technique that writes English descriptions of Java methods by analyzing how those methods are invoked. We then performed two user studies to evaluate our approach. First, we compared our generated summaries to summaries written manually by experts. Then, we compared our summaries to summaries written by a state-of-the-art automatic summarization tool. We found that while our approach does not reach the quality of human-written summaries, we do improve over the state-of-the-art summarization tool in several dimensions by a statistically-significant margin.

关键词： source code summarization automatic documentation program comprehension

来源：评论

学校读者我要写书评

暂无评论

Automated feature discovery via sentence selection and source code summarization

引用

JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS 2016年第2期28卷 120-145页

作者： McBurney, Paul W. Liu, Cheng McMillan, Collin Univ Notre Dame Dept Comp Sci & Engn Notre Dame IN 46556 USA

Programs are, in essence, a collection of implemented features. Feature discovery in software engineering is the task of identifying key functionalities that a program implements. Manual feature discovery can be time consuming and expensive, leading to automatic feature discovery tools being developed. However, these approaches typically only describe features using lists of keywords, which can be difficult for readers who are not already familiar with the source code. An alternative to keyword lists is sentence selection, in which one sentence is chosen from among the sentences in a text document to describe that document. Sentence selection has been widely studied in the context of natural language summarization but is only beginning to be explored as a solution to feature discovery. In this paper, we compare four sentence selection strategies for the purpose of feature discovery. Two are off-the-shelf approaches, while two are adaptations we propose. We present our findings as guidelines and recommendations to designers of feature discovery tools. Copyright (c) 2016 John Wiley & Sons, Ltd.

关键词： feature discovery sentence selection source code summarization

来源：评论

学校读者我要写书评

暂无评论

GA-SCS: Graph-Augmented source code summarization

引用

ACM TRANSACTIONS ON ASIAN AND LOW-REsource LANGUAGE INFORMATION PROCESSING 2023年第2期22卷 1-19页

作者： Zhang, Mengli Zhou, Gang Yu, Wanting Huang, Ningbo Liu, Wenfen State Key Lab Math Engn & Adv Comp 62 Sci Ave Zhengzhou 450000 Peoples R China Guilin Univ Elect Technol 1 Jinji Rd Guilin 541009 Peoples R China

Automatic source code summarization system aims to generate a valuable natural language description for a program, which can facilitate software development and maintenance, code categorization, and retrieval. However, previous sequence-based research did not consider the long-distance dependence and highly structured characteristics of source code simultaneously. In this article, we present a Transformer-based GraphAugmented source code summarization (GA-SCS), which can effectively incorporate inherent structural and textual features of source code to generate an effective code description. Specifically, we develop a graphbased structure feature extraction scheme leveraging abstract syntax tree and graph attention networks to mine global syntactic information. And then, to take full advantage of the lexical and syntactic information of code snippets, we extend the original attention to a syntax-informed self-attention mechanism in our encoder. In the training process, we also adopt a reinforcement learning strategy to enhance the readability and informativity of generated code summaries. We utilize the Java dataset and Python dataset to evaluate the performance of different models. Experimental results demonstrate that our GA-SCS model outperforms all competitive methods on BLEU, METEOR, ROUGE, and human evaluations.

关键词： Program comprehension natural language processing source code summarization self-attention reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Function Call Graph Context Encoding for Neural source code summarization

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2023年第9期49卷 4268-4281页

作者： Bansal, Aakash Eberhart, Zachary Karas, Zachary Huang, Yu Mcmillan, Collin Univ Notre Dame Dept Comp Sci & Engn Notre Dame IN 46556 USA Univ Vanderbilt Dept Comp Sci Tennessee IL USA

source code summarization is the task of writing natural language descriptions of source code. The primary use of these descriptions is in documentation for programmers. Automatic generation of these descriptions is a high value research target due to the time cost to programmers of writing these descriptions themselves. In recent years, a confluence of software engineering and artificial intelligence research has made inroads into automatic source code summarization through applications of neural models of that source code. However, an Achilles' heel to a vast majority of approaches is that they tend to rely solely on the context provided by the source code being summarized. But empirical studies in program comprehension are quite clear that the information needed to describe code much more often resides in the context in the form of Function Call Graph surrounding that code. In this paper, we present a technique for encoding this call graph context for neural models of code summarization. We implement our approach as a supplement to existing approaches, and show statistically significant improvement over existing approaches. In a human study with 20 programmers, we show that programmers perceive generated summaries to generally be as accurate, readable, and concise as human-written summaries.

关键词： codes source coding Context modeling Decoding Algorithms Software engineering Machine translation Automatic documentation generation context-aware models neural networks source code summarization

来源：评论

学校读者我要写书评

暂无评论

READSUM: Retrieval-Augmented Adaptive Transformer for source code summarization

引用

IEEE ACCESS 2023年 11卷 51155-51165页

作者： Choi, Yunseok Na, Cheolwon Kim, Hyojun Lee, Jee-Hyong Sungkyunkwan Univ Dept Platform Software Suwon 16419 South Korea Sungkyunkwan Univ Dept Artificial Intelligence Suwon 16419 South Korea

code summarization is the process of automatically generating brief and informative summaries of source code to aid in software comprehension and maintenance. In this paper, we propose a novel model called READSUM, REtrieval-augmented ADaptive transformer for source code summarization, that combines both abstractive and extractive approaches. Our proposed model generates code summaries in an abstractive manner, taking into account both the structural and sequential information of the input code, while also utilizing an extractive approach that leverages a retrieved summary of similar code to increase the frequency of important keywords. To effectively blend the original code and the retrieved similar code at the embedding layer stage, we obtain the augmented representation of the original code and the retrieved code through multi-head self-attention. In addition, we develop a self-attention network that adaptively learns the structural and sequential information for the representations in the encoder stage. Furthermore, we design a fusion network to capture the relation between the original code and the retrieved summary at the decoder stage. The fusion network effectively guides summary generation based on the retrieved summary. Finally, READSUM extracts important keywords using an extractive approach and generates high-quality summaries using an abstractive approach that considers both the structural and sequential information of the source code. We demonstrate the superiority of READSUM through various experiments and an ablation study. Additionally, we perform a human evaluation to assess the quality of the generated summary.

关键词： codes Transformers source coding Syntactics Data mining Task analysis Adaptive systems Shortest path problem Abstract syntax tree adaptive transformer source code summarization fusion network shortest path

来源：评论

学校读者我要写书评

暂无评论

Autofolding for source code summarization

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2017年第12期43卷 1095-1109页

作者： Fowkes, Jaroslav Chanthirasegaran, Pankajan Ranca, Razvan Allamanis, Miltiadis Lapata, Mirella Sutton, Charles Univ Edinburgh Sch Informat Edinburgh EH8 9AB Midlothian Scotland Tractable Oval Off 11-12 Oval London E2 9DT England

Developers spend much of their time reading and browsing source code, raising new opportunities for summarization methods. Indeed, modern code editors provide code folding, which allows one to selectively hide blocks of code. However this is impractical to use as folding decisions must be made manually or based on simple rules. We introduce the autofolding problem, which is to automatically create a code summary by folding less informative code regions. We present a novel solution by formulating the problem as a sequence of AST folding decisions, leveraging a scoped topic model for code tokens. On an annotated set of popular open source projects, we show that our summarizer outperforms simpler baselines, yielding a 28 percent error reduction. Furthermore, we find through a case study that our summarizer is strongly preferred by experienced developers. More broadly, we hope this work will aid program comprehension by turning code folding into a usable and valuable tool.

关键词： source code summarization program comprehension topic modelling

来源：评论

学校读者我要写书评

暂无评论

引用

30th IEEE/ACM International Conference on Program Comprehension (ICPC)

作者： Haque, Sakib Eberhart, Zachary Bansal, Aakash McMillan, Collin Univ Notre Dame Notre Dame IN 46556 USA

ISBN: (纸本)9781450392983

source code summarization involves creating brief descriptions of source code in natural language. These descriptions are a key component of software documentation such as JavaDocs. Automatic code summarization is a prized target of software engineering research, due to the high value summaries have to programmers and the simultaneously high cost of writing and maintaining documentation by hand. Current work is almost all based on machine models trained via big data input. Large datasets of examples of code and summaries of that code are used to train an e.g. encoder-decoder neural model. Then the output predictions of the model are evaluated against a set of reference summaries. The input is code not seen by the model, and the prediction is compared to a reference. The means by which a prediction is compared to a reference is essentially word overlap, calculated via a metric such as BLEU or ROUGE. The problem with using word overlap is that not all words in a sentence have the same importance, and many words have synonyms. The result is that calculated similarity may not match the perceived similarity by human readers. In this paper, we conduct an experiment to measure the degree to which various word overlap metrics correlate to human-rated similarity of predicted and reference summaries. We evaluate alternatives based on current work in semantic similarity metrics and propose recommendations for evaluation of source code summarization.

关键词： source code summarization automatic documentation generation evaluation metrics

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：