检索结果-内蒙古大学图书馆

IEEE TRANSACTIONS ON RELIABILITY 2025年第1期74卷 2280-2289页

作者： Kamal, Md Sarwar Nimmy, Sonia Farhana Dey, Nilanjan Univ Technol Sydney Fac Engn & Informat Technol Sch Comp Sci Ultimo NSW 2007 Australia Univ New South Wales Fac Econ & Business Sydney NSW 2052 Australia Techno Int New Town Dept Comp Sci & Engn Chakpachuria 700156 W Bengal India

code summarization is a process of creating a readable natural language from programming source codes. code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and use various machine learning techniques to generate natural language from source codes. Although most of these methods are state of the art, it is difficult to understand the complex encoding and decoding process to map the tokens with natural language words. Therefore, these coding and decoding approaches are treated as opaque models (black box). This research proposes explainable AI methods that overcome the black box features for the token mapping in code summarization process. Here, we created an abstract syntax tree (AST) from the tokens of the source code. We then embedded the AST into natural language words using a bilingual statistical probability approach to generate possible statistical parse trees. We applied a page rank algorithm among the parse trees to rank the trees. From the best-ranked tree, we generate the comment for the corresponding code snippet. To explain our code generation method, we used Takagi-Sugeno fuzzy approach, layerwise relevance propagation and a hidden Markov model. These approaches make our method trustworthy and understandable to humans to understand the process of source code token mapping with natural language words.

关键词： Abstract syntax tree (AST) code summarization explainable a hidden Markov model (HMM) LRP page rank Takagi-Sugeno (T-S) fuzzy

来源：评论

学校读者我要写书评

暂无评论

RaxCS: Towards cross-language code summarization with contrastive pre-training and retrieval augmentation

引用

INFORMATION AND SOFTWARE TECHNOLOGY 2025年 183卷

作者： Yang, Kaiyuan Wang, Junfeng Song, Zihua Sichuan Univ Coll Comp Sci Chengdu 610065 Peoples R China Sichuan Univ Sch Cyber Sci & Engn Chengdu 610207 Peoples R China

Context: code summarization is the task of generating a concise natural language description of the code snippet. Recent efforts have been made to boost the performance of code summarization language from various perspectives, e.g., retrieving external information or introducing large transformer-based models, and thus has achieved promising performance for one specific programming language. While dealing with rapidly expanded cross-language source code datasets, existing approaches suffer from two issues, (1) the difficulty of building a universe code representation for multiple languages;(2) less-well performance for low-resource language. Objective: To cope with these issues, we propose a novel code summarization approach named RaxCS, which aims to perform code summarization across multiple languages and improve accuracy for low-resource languages by leveraging cross-language knowledge. Methods: We exploit the pre-trained models with the contrastive learning objective to build a unified code representation towards multiple languages. To fully mine the external knowledge across programming languages, we design a hybrid retrieval module to search functionally equivalent code and its corresponding comment to serve as preliminary information. Finally, we employ a decode-only transformer model to fuse contextual information, which guides the process of generating summaries. Results: Extensive experiments demonstrate (1) RaxCS outperforms the state-of-the-art on cross-language code summarization (i.e., RaxCS scores 4.39% higher in terms of BLEU metric and 8.65% in terms of BERTScore). (2) For low-resource languages, RaxCS can boost the code summarization performance by a significant magnification (e.g., 6.93% in terms of BLEU for ruby) with cross-language retrieval. Conclusion: This paper introduces a cross-language code summarization model, which utilizes contrastive pre-training and cross-language retrieval. Both are beneficial for incorporating cross-language knowle

关键词： code summarization Cross-language Retrieval augmentation Contrastive learning

来源：评论

学校读者我要写书评

暂无评论

SDGNN: Structure-aware Dual Graph Neural Network for code summarization

引用

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS 2025年 1-17页

作者： Hao, Zhifeng Lin, Zonghao Zhang, Shengqiang Xu, Boyan Cai, Ruichu Guangdong Univ Technol Sch Comp Sci Guangzhou 510006 Peoples R China Shantou Univ Coll Sci Shantou 515063 Peoples R China

code summarization aims to convert structured program code into comprehensible natural language descriptions, significantly benefiting software development. The existing approaches mainly employ structure-to-sequence frameworks designed for the Abstract Syntax Tree (AST) format of source code, extensively utilizing architectures such as Tree-based LSTMs, and Graph Neural Networks. From modeling process to encoding architecture can't effectively learn some of the complex dependencies of the code snippets. In this paper, we propose a Structure-aware Dual Graph Neural Network (SDGNN) for code summarization. Specially, we employ both the grammatical dependency graph and the semantic dependency graph to catch the complex dependency of the program codes in SDGNN. To realize the effective learning of the dual graph, we further devise the hierarchical propagation and the graphical propagation to generate the encoding of the codes, as well as a graph alignment-based dual graph decoder to generate the summarizations from the encoding. Extensive experiments on three programming language datasets show that our framework outperforms state-of-the-art solutions.

关键词： code summarization Natural language generation Abstract syntax trees Graph neural network Dual graph Structure-aware

来源：评论

学校读者我要写书评

暂无评论

Design of An Eye-Tracking Study Towards Assessing the Impact of Generative AI Use on code summarization 25

Design of An Eye-Tracking Study Towards Assessing the Impact...

引用

Proceedings of the 2025 Symposium on Eye Tracking Research and Applications

作者： Suad Mohamed Najma Ismail Kimberly Amaya Hernandez Abdullah Parvin Michael Oliver Esteban Parra Mathematics Computer Science and Data Science Belmont University Nashville Tennessee USA Psychological Science and Neuroscience Belmont University Nashville Tennessee USA

来源：评论

学校读者我要写书评

暂无评论

code summarization with Abstract Syntax Tree 1

引用

26th International Conference on Neural Information Processing (ICONIP) of the Asia-Pacific-Neural-Network-Society (APNNS)

作者： Chen, Qiuyuan Hu, Han Liu, Zhaoyi Zhejiang Univ Coll Comp Sci & Technol Hangzhou Peoples R China Tsinghua Univ Sch Software Beijing Peoples R China Peking Univ Sch Shenzhen Grad Shenzhen 518055 Peoples R China

ISBN: (数字)9783030368029

ISBN: (纸本)9783030368029;9783030368012

code summarization, which provides a high-level description of the function implemented by code, plays a vital role in software maintenance and code retrieval. Traditional approaches focus on retrieving similar code snippets to generate summaries, and recently researchers pay increasing attention to leverage deep learning approaches, especially the encoder-decoder framework. Approaches based on encoder-decoder suffer from two drawbacks: (a) Lack of summarization in functionality level;(b) code snippets are always too long (more than ten words), regular encoders perform poorly. In this paper, we propose a novel code representation with the help of Abstract Syntax Trees, which could describe the functionality of code snippets and shortens the length of inputs. Based on our proposed code representation, we develop Generative Task, which aims to generate summary sentences of code snippets. Experiments on large-scale real-world industrial Java projects indicate that our approaches are effective and outperform the state-of-the-art approaches in code summarization.

关键词： code summarization code clone code representation

来源：评论

学校读者我要写书评

暂无评论

code summarization without Direct Access to code - Towards Exploring Federated LLMs for Software Engineering 24

Code Summarization without Direct Access to Code - Towards E...

引用

28th International Conference on Evaluation and Assessment in Software Engineering (EASE)

作者： Kumar, Jahnavi Chimalakonda, Sridhar Indian Inst Technol Dept Comp Sci & Engn Tirupati Andhra Pradesh India

ISBN: (纸本)9798400717017

Software Engineering (SE) researchers are extensively applying Large Language Models (LLMs) to address challenges in SE tasks such as code clone detection, code summarization, and program comprehension. Despite promising results, LLMs have to be fine-tuned and customized with specific datasets for optimal performance. However, the proprietary nature of SE data, and the lack of LLMs trained on non-open source data is an open problem. While there exists work on applying Federated Learning (FL) for SE, integration of FL with LLMs for SE is unexplored. Hence, we propose a FedLLM for "code summarization" as developers spend more time in comprehending code. We setup a federated learning architecture and fine-tune LLM (Llama2 with 6.7B parameters) using Parameter Efficient Fine-Tuning (PEFT) for code summarization. We conducted our experiments on 40GB RAM GPU in an A100 architecture. Results show that FL-trained LLM is as effective as a centrally-trained one. We envision that leveraging non-open source data using FedLLM for SE could be an interesting research direction.

关键词： code summarization Federated Learning Large Language Model (LLM) Parameter Efficient Fine-Tuning (PEFT)

来源：评论

学校读者我要写书评

暂无评论

An Extractive-and-Abstractive Framework for Source code summarization

引用

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY 2024年第3期33卷 1-39页

作者： Sun, Weisong Fang, Chunrong Chen, Yuchen Zhang, Quanjun Tao, Guanhong You, Yudu Han, Tingxu Ge, Yifei Hu, Yuling Luo, Bin Chen, Zhenyu Nanjing Univ State Key Lab Novel Software Technol 22 Hankou Rd Nanjing 210093 Jiangsu Peoples R China Purdue Univ 610 Purdue Mall W Lafayette IN 47907 USA

(Source) code summarization aims to automatically generate summaries/comments for given code snippets in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of the generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models. However, the generated summaries often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs the task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractivemodule in the framework performs the task of abstractive code summarization, which takes in the code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques for all three widely used metrics, including BLEU, METEOR, and ROUGH-L. In addition, the human evaluation demonstrates that the summaries generated by EACS have higher naturalness and informativeness and are more relevant to given code snippets.

关键词： code summarization extractive code summarization abstractive code summarization program comprehension

来源：评论

学校读者我要写书评

暂无评论

Do code summarization Models Process Too Much Information? Function Signature May Be All That Is Needed

引用

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY 2024年第6期33卷 1-35页

作者： Ding, Xi Peng, Rui Chen, Xiangping Huang, Yuan Bian, Jing Zheng, Zibin Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China Sun Yat Sen Univ Sch Commun & Design Guangzhou Peoples R China Sun Yat Sen Univ Sch Software Engn Zhuhai Peoples R China

With the fast development of large software projects, automatic code summarization techniques, which summarize the main functionalities of a piece of code using natural languages as comments, play essential roles in helping developers understand and maintain large software projects. Many research efforts have been devoted to building automatic code summarization approaches. Typical code summarization approaches are based on deep learning models. They transform the task into a sequence-to-sequence task, which inputs source code and outputs summarizations in natural languages. All code summarization models impose different input size limits, such as 50 to 10,000, for the input source code. However, how the input size limit affects the performance of code summarization models still remains under-explored. In this article, we first conduct an empirical study to investigate the impacts of different input size limits on the quality of generated code comments. To our surprise, experiments on multiple models and datasets reveal that setting a low input size limit, such as 20, does not necessarily reduce the quality of generated comments. Based on this finding, we further propose to use function signatures instead of full source code to summarize the main functionalities first and then input the function signatures into code summarization models. Experiments and statistical results show that inputs with signatures are, on average, more than 2 percentage points better than inputs without signatures and thus demonstrate the effectiveness of involving function signatures in code summarization. We also invite programmers to do a questionnaire to evaluate the quality of code summaries generated by two inputs with different truncation levels. The results show that function signatures generate, on average, 9.2% more high-quality comments than full code.

关键词： code summarization function signature empirical study

来源：评论

学校读者我要写书评

暂无评论

EnCoSum: enhanced semantic features for multi-scale multi-modal source code summarization

引用

EMPIRICAL SOFTWARE ENGINEERING 2023年第5期28卷 1-43页

作者： Gao, Yuexiu Zhang, Hongyu Lyu, Chen Shandong Normal Univ Sch Informat Sci & Engn Jinan Peoples R China Chongqing Univ Chongqing Peoples R China

code summarization aims to generate concise natural language descriptions for a piece of code, which can help developers comprehend the source code. Analysis of current work shows that the extraction of syntactic and semantic features of source code is crucial for generating high-quality summaries. To provide a more comprehensive feature representation of source code from different perspectives, we propose an approach named EnCoSum, which enhances semantic features for the multi-scale multi-modal code summarization method. This method complements our previously proposed M2TS approach (multi-scale multi-modal approach based on Transformer for source code summarization), which uses the multi-scale method to capture Abstract Syntax Trees (ASTs) structural information more completely and accurately at multiple local and global levels. In addition, we devise a new cross-modal fusion method to fuse source code and AST features, which can highlight key features in each modality that help generate summaries. To obtain richer semantic information, we improve M2TS. First, we add data flow and control flow to ASTs, and added-edge ASTs, called Enhanced-ASTs (E-ASTs). In addition, we introduce method name sequences extracted in the source code, which exist more knowledge about critical tokens in the corresponding summaries and can help the model generate higher-quality summaries. We conduct extensive experiments on processed Java and Python datasets and evaluate our approach via the four most commonly used machine translation metrics. The experimental results demonstrate that EnCoSum is effective and outperforms current state-of-the-art methods. Further, we perform ablation experiments on each of the model's key components, and the results show that they all contribute to the performance of EnCoSum.

关键词： code summarization Abstract syntax trees Method name sequences Cross-modal fusion Deep learning

来源：评论

学校读者我要写书评

暂无评论

Enhancing code summarization with action word prediction

引用

NEUROCOMPUTING 2024年 563卷

作者： Li, Mingchen Yu, Huiqun Fan, Guisheng Zhou, Ziyi Huang, Zijie East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai Peoples R China Shanghai Engn Res Ctr Smart Energy Shanghai Peoples R China Shanghai Comp Software Tech Dev Ctr Shanghai Key Lab Comp Software Evaluating & Testin Shanghai Peoples R China

code summarization refers to automatically generating concise description in natural language from a code snippet. Good code summaries could effectively facilitate program comprehension and software maintenance. In recent years, various learning-based code summarization techniques have achieved impressive performance. Most of these models treat code summarization as an end-to-end model and directly generate the summaries, which ignores the fact that action words are crucial to code summaries. An essential characteristic of code summaries is the concentration of action word distribution. For instance, in the Funcom dataset, the top forty most-common action words account for 72% of all samples. To incorporate this valuable prior domain knowledge into code summarization models, we develop a method for assisting code summarization through an additional action word prediction module, where an action predictor is employed to predict the primary action in the code summary, which is then used as a prompt to enhance the performance of the summary generation model. Our approach can be conveniently integrated into the existing models. We evaluate our approach on two Java datasets and a C/C++ dataset. The results show that our approach can efficiently improve the performance of the code summarization models. Furthermore, our action word prediction module can enhance the performance of a large pre-trained language model by prompting it with the predicted action words. This work suggests that a precise action word prediction model can significantly improve the performance of code summarization through the proposed action word guidance mechanism.

关键词： code summarization Action word prediction Multi-task learning Deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：