版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai Peoples R China Shanghai Engn Res Ctr Smart Energy Shanghai Peoples R China Shanghai Comp Software Tech Dev Ctr Shanghai Key Lab Comp Software Evaluating & Testi Shanghai Peoples R China
出 版 物:《NEURAL COMPUTING & APPLICATIONS》 (神经网络计算与应用)
年 卷 期:2023年第35卷第4期
页 面:3373-3393页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Natural Science Foundation of China Shanghai Natural Science Foundation [21ZR1416300] Capacity building project of local universities Science and Technology Commission of Shanghai Municipality
主 题:Program comprehension Code summarization Class documentation Deep learning
摘 要:Code summaries are clear and concise natural language descriptions of program entities. Meaningful code summaries assist developers in better understanding. Code summarization refers to the task of generating a natural language summary from a code snippet. Most researches on code summarization focus on automatically generating summaries for methods or functions. However, in an object-oriented language such as Java, class is the basic programming unit rather than method. To fill this gap, in this paper, we investigate how to generate summaries for Java classes utilizing deep learning-based approaches. We propose a novel encoder-decoder model called ClassSum to generate functionality descriptions for Java classes and build a dataset containing 172,639 pairs from 3185 repositories hosted on Github. Since the code of class is much longer and more complicated, encoding a whole class via neural network is more challenging than encoding a method. On the other hand, the content within a class may be incomplete. To overcome this difficulty, we reduce the code of a class by only keeping its key elements, namely class signatures, method signatures and attribute names. To utilize both lexical and structural information of code, our model takes token sequence and abstract syntax tree of the reduced class content as inputs. ClassSum and five baselines (designed for method-level code summarization) are evaluated on our dataset. Experiment results show that summaries generated by ClassSum are more accurate and readable than those generated by baselines. Our dataset is available at https://***/classsum/ClassSum.