检索结果-内蒙古大学图书馆

Towards Improving the Performance of Comment Generation Models by Using BytecodeInformation

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2025年第2期51卷 503-520页

作者： Huang, Yuan Huang, Jinbo Chen, Xiangping Zheng, Zibin Sun Yat Sen Univ Sch Software Engn Guangzhou Peoples R China GuangDong Engn Technol Res Ctr Blockchain Zhuhai 510006 Peoples R China Sun Yat Sen Univ Sch Journalism & Commun Guangzhou 510275 Peoples R China

Code comment plays an important role in program understanding, and a large number of automatic comment generation methods have been proposed in recent years. To get a better effect of generating comments, many studies try to extract a variety of information (e.g., code tokens, AST traverse sequence, APIs call sequence) from source code as model input. In this study, we found that the bytecode compiled from the source code can provide useful information for comment generation, hence we propose to use the information from bytecode to assist the comment generation. Specifically, we extract the control flow graph (CFG) from the bytecode and propose a serialization method to obtain the CFG sequence that preserves the program structure. Then, we discuss three methods for introducing bytecode information for different models. We collected 390,000 Java methods from the maven repository, and created a dataset of 101,124 samples after deduplication and preprocessing to evaluate our method. The results show that introducing the information extracted from the bytecode can improve the BLEU-4 of 7 comment generation models.

关键词： Codes source coding Data mining Flow graphs Transformers Training Software engineering Neural networks Java Data models Code comment comment generation control flow graph bytecode

来源：评论

学校读者我要写书评

暂无评论

Understanding Code Understandability Improvements in Code Reviews

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2025年第1期51卷 14-37页

作者： Oliveira, Delano Santos, Reydne de Oliveira, Benedito Monperrus, Martin Castor, Fernando Madeiral, Fernanda Univ Fed Pernambuco BR-50732970 Recife Brazil Fed Inst Pernambuco BR-55540000 Palmares Brazil KTH Royal Inst Technol Stockholm Sweden Univ Twente NL-7522 NB Enschede Netherlands Vrije Univ Amsterdam NL-1081 HV Amsterdam Netherlands

Context: Code understandability plays a crucial role in software development, as developers spend between 58% and 70% of their time reading source code. Improving code understandability can lead to enhanced productivity and save maintenance costs. Problem: Experimental studies aim to establish what makes code more or less understandable in a controlled setting, but ignore that what makes code easier to understand in the real world also depends on extraneous elements such as developers' background and project culture and guidelines. Not accounting for the influence of these factors may lead to results that are sound but have little external validity. Goal: We aim to investigate how developers improve code understandability during software development through code review comments. Our assumption is that code reviewers are specialists in code quality within a project. Method and Results: We manually analyzed 2,401 code review comments from Java open-source projects on GitHub and found that over 42% of all comments focus on improving code understandability, demonstrating the significance of this quality attribute in code reviews. We further explored a subset of 385 comments related to code understandability and identified eight categories of code understandability concerns, such as incomplete or inadequate code documentation, bad identifier, and unnecessary code. Among the suggestions to improve code understandability, 83.9% were accepted and integrated into the codebase. Among these, only two (less than 1%) ended up being reverted later. We also identified types of patches that improve code understandability, ranging from simple changes (e.g., removing unused code) to more context-dependent improvements (e.g., replacing method calling chains by existing API). Finally, we investigated the potential coverage of four well-known linters to flag the identified code understandability issues. These linters cover less than 30% of these issues, although some of them could be ea

关键词： Codes Reviews source coding Software development management Documentation Security Natural languages Code understandability code understandability smells code review

来源：评论

学校读者我要写书评

暂无评论

Blending Static and Dynamic Analysis for Web Application Vulnerability Detection: Methodology and Case Study

引用

IEEE ACCESS 2025年 13卷 3139-3153页

作者： Nunes, Paulo Fonseca, Jose Vieira, Marco Univ Coimbra Polytech Guarda CISUC P-3004531 Coimbra Portugal Univ North Carolina Charlotte Charlotte NC 28223 USA

Static Analysis (SA) and Dynamic Analysis (DA) are complementary techniques for searching web application vulnerabilities. Typically, SA detects more vulnerabilities but reports a higher number of false positives, whereas DA finds less but with better precision. In this paper, we blend SA and DA to simultaneously improve the detection and decrease the false alarms. Our approach starts with SA to identify an initial set of potential vulnerabilities. Then, the target application is executed to obtain specific runtime information. These data are used to automatically configure the DA, improving its ability to confirm if the vulnerabilities reported by the SA are indeed exploitable. We evaluated the proposed approach using 49 WordPress plugins with more than 450 SQLi vulnerabilities. Our approach was able to confirm either as a vulnerability or a false alarm 76.7% of the results reported by the SA, decreasing tremendously the usual need for manual work, which is a huge improvement for security practitioners.

关键词： Runtime Codes Security Vectors Static analysis source coding Protection Input variables Uniform resource locators Structured Query Language dynamic analysis vulnerability detection execution traces SQLi blend analysis

来源：评论

学校读者我要写书评

暂无评论

Path-Based Clustering Approaches for a Hybrid Slepian-Wolf Compression and Coded Caching System

引用

IEEE ACCESS 2025年 13卷 48935-48949页

作者： Rosen, Benjamin Abu-Mahfouz, Adnan M. Cheng, Ling Univ Witwatersrand Sch Elect & Informat Engn ZA-2000 Johannesburg South Africa CSIR ZA-0001 Pretoria South Africa

Combining caching with source coding, a hybrid content delivery system further facilitates the shift towards Information-Centric Networks. This is a promising technology heralded as the next phase in network design. However, finding the optimal balance between the source coding gains and the computational complexity is itself an NP-hard problem. By modelling the problem using a path-based approach, this paper outlines iterative algorithms that can be tuned to provide control over this trade-off. So too, a necessary condition of optimality is derived. This condition can be applied repeatedly to improve the performance of the results from the iterative algorithms. The Ant Colony Optimisation family of meta-heuristic algorithms is adapted to solve this problem, providing a benchmark that outperforms the Genetic Algorithm presented in prior work. The iterative algorithms have a larger time complexity than other solutions, but still converge in polynomial time. When combined with the optimality condition, they outperform all of the currently proposed algorithms that solve this problem to date. More specifically, this approach produces results that are found to fall in the 99.97th percentile on average.

关键词： Iterative methods Complexity theory Metaheuristics Entropy Clustering algorithms source coding Genetic algorithms Servers Scalability NP-hard problem Ant colony optimization clustering algorithms content distribution networks heuristic algorithms Information-Centric Networking information entropy information theory iterative algorithms mutual information source coding

来源：评论

学校读者我要写书评

暂无评论

Recovering Traceability Links Between Code and Documentation: A Retrospective

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2025年第3期51卷 825-832页

作者： Antoniol, Giulio Canfora, Gerardo Casazza, Gerardo De Lucia, Andrea Merlo, Ettore Polytech Montreal Dept Comp & Software Engn Montreal PQ H3C 3A7 Canada Univ Sannio I-82100 Benevento Italy DENSO Int Europe BV World Trade Ctr NL-1077 XX Amsterdam Netherlands Univ Salerno Dept Comp Sci I-84084 Fisciano SA Italy

Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. In our 2002 seminal paper we proposed a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work was that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. The paper paved the way to the adoption of IR in software engineering opening a new perspective. Reflecting on the past twenty years we briefly overview the many results that have been achieved, however, the emergence of new technologies, such as AI, pose unprecedented challenges.

关键词： Software source coding Software engineering Codes Unified modeling language Maintenance Data mining Probabilistic logic Computer bugs Vectors Redocumentation information retrieval object orientation program comprehension traceability

来源：评论

学校读者我要写书评

暂无评论

Refactoring Android source Code Smells From Android Applications

引用

IEEE ACCESS 2025年 13卷 14122-14150页

作者： Fawad, Muhammad Rasool, Ghulam Riaz, Muhammad Bilal Riphah Int Univ Fac Comp Sch Comp & Innovat Lahore Campus Lahore 54000 Pakistan COMSATS Univ Islamabad Dept Comp Sci Lahore Campus Lahore 54000 Pakistan VSB Tech Univ Ostrava IT4Innovations Ostrava 70800 Czech Republic Lebanese Amer Univ Dept Comp Sci & Math Byblos 14012010 Lebanon

As technology advances and new features emerge, the demand for Android applications continues to grow, leading to rapid release schedules. These accelerated development timelines often push developers to make rushed changes, often resulting in suboptimal design practices, commonly known as code smells. These issues can degrade application quality, drive up maintenance costs, lead to unexpected behaviors, and complicate evolution and re-engineering efforts. While substantial research has focused on identifying Android-specific and object-oriented code smells, comparatively less attention has been devoted to their systematic refactoring and evaluation. This study introduces a web-based technique, validated through a tool specifically developed to detect 20 Android-specific code smells and automatically refactor 10 of them. Our approach surpasses traditional desktop and plugin solutions by providing easy accessibility, cross-platform compatibility, and eliminating setup requirements. When applied to six open-source and two industrial Android applications and evaluated against the ISO/IEC 25010 quality standard, our tool demonstrated considerable improvements: reducing CPU utilization by 15.39%, lowering memory consumption by 12.85%, and enhancing battery efficiency by up to 5.78%. The tool's accuracy, validated through precision, recall, and F-measure metrics, achieved averages of 91.81% precision, 97.77% recall, and a 94.67% F-measure. This study enhances the Android application development lifecycle by offering developers a feasible solution for optimizing CPU efficiency, reducing memory use, and minimizing battery consumption.

关键词： Codes Operating systems Smart phones Mobile applications Software source coding Performance evaluation Memory management Standards Software development management Android applications code smells refactoring software quality

来源：评论

学校读者我要写书评

暂无评论

Context Is All You Need: A Hybrid Attention-Based Method for Detecting Code Design Patterns

引用

IEEE ACCESS 2025年 13卷 9689-9707页

作者： Houichime, Tarik El Amrani, Younes Mohammed V Univ Rabat Lab Software Project Management ENSIAS Rabat 10112 Morocco

Software reverse engineering plays a crucial role in identifying design patterns and reconstructing software architectures by analyzing system implementations and producing abstract representations across multiple layers. This research introduces a novel feature engineering approach that integrates both behavioral and structural analysis of code, resulting in a feature-rich sequential representation. This transformation enables the effective use of transformers and attention mechanisms to detect design patterns in source code. Our results emphasize the importance of context in distinguishing between various design patterns, demonstrating that the proposed sequence format, with its sensitivity to token order, significantly improves the model's capacity to differentiate between similar patterns. By leveraging the power of attention mechanisms, our approach efficiently discards irrelevant code elements, focusing on the most critical features for accurate patterns detection. Additionally, we show that this sequential code representation can be utilized to augment training data, leading to enhanced model accuracy. Trained on a diverse set of code samples representing all 23 GoF design patterns, sourced from repositories such as GitHub and Bitbucket, our methodology achieved an accuracy of 92%. Evaluation metrics further validate the robustness of the approach. This study underscores the potential of context-driven, feature-engineered representations in advancing design patterns detection and contributes a comprehensive new dataset that supports behavioral code analysis, setting the stage for future research in this area.

关键词： Codes Transformers Vehicle dynamics source coding Unified modeling language Transformer cores Semantics Context modeling Computer architecture Attention mechanisms Design patterns detection transformers attention mechanism feature engineering

来源：评论

学校读者我要写书评

暂无评论

Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2025年第5期51卷 1569-1590页

作者： Li, Xun Liu, Lei Liu, Yuzhou Zhao, Yu Zhang, Peng Liu, Huaxiao Jilin Univ Coll Comp Sci & Technol Changchun 130012 Peoples R China Northeast Elect Power Univ Sch Comp Sci Jilin 132012 Jilin Peoples R China Jilin Univ Key Lab Symbol Computat & Knowledge Engn Minist Educ Changchun 130012 Peoples R China Univ Cincinnati Coll Comp Sci Cincinnati OH 45211 USA

Malware detection is a critical issue in software engineering as it directly threatens user information security. Existing approaches often focus on individual modality (either source code or binary code) for the detection, but it ignores to effectively exploit the complementary information between them. This limits the detection performance, especially in complex and evasive malware scenarios. In this paper, we take Android applications written in Java as objects, and provide a novel fine-grained multimodal fusion method with large pre-trained models to combine the features from source and binary codes for the malware detection. For the source code modality, we employ the graphical user interface (GUI) as a framework to segment the source code into snippets, and use a pre-trained programming language model to extract feature representations. For the binary code modality, we convert binary code into grayscale images and fine-tune a pre-trained vision model to extract features indirectly. We then implement cross-modal attention and devise a contrastive loss to align features across modalities, supplementing this with supervised classification loss to refine the multimodal fusion process specifically for malware detection. Our experiments, conducted using the Data-MD and Data-MC benchmarks, demonstrate that our approach achieves a precision of 0.977 and a recall of 0.984 in detecting malware. This underscores the advantages of using large pre-trained models for feature representation and the fusion of information across different modalities for effective malware detection.

关键词： Feature extraction Malware Binary codes source coding Vectors Software Graphical user interfaces Training Java Gray-scale Android malware malware detection multimodal fusion pre-trained model deep learning

来源：评论

学校读者我要写书评

暂无评论

A Reflection on Change Classification in the Era of Large Language Models

引用

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2025年第3期51卷 864-869页

作者： Kim, Sunghun Shivaji, Shivkumar Whitehead, Jim Upstage Corp Hong Kong Peoples R China Hong Kong Univ Sci & Technol Hong Kong Peoples R China Univ Calif Santa Cruz Generat AI Partners Menlo Pk CA 94025 USA Univ Calif Santa Cruz Dept Computat Media Santa Cruz CA 95064 USA

Change classification, today known as Just-in-Time Defect Prediction, is a technique for predicting software bugs at the change level of granularity. Several ideas came together to form change classification: predictions on code changes, using word-level textual features, use of machine learning classifiers, and leveraging open source code repositories. While change classification has led to a robust line of research, it has not yet had significant industrial adoption. A key recommendation is to explore explainability features so developers can better understand why a prediction is being made. We explore how large language models can advance this work by providing prediction explanations and bug fix suggestions.

关键词： Computer bugs Codes Software Machine learning source coding Training History Data mining Training data Software measurement Classification-based bug prediction Just-in-Time defect prediction AI explainability LLM bug prediction

来源：评论

学校读者我要写书评

暂无评论

Universal Slepian-Wolf coding for Individual Sequences

引用

IEEE TRANSACTIONS ON INFORMATION THEORY 2025年第1期71卷 783-796页

作者： Merhav, Neri Tech Israel Inst Technol Viterbi Fac ECE IL-3200003 Haifa Israel

We establish a coding theorem and a matching converse theorem for separate encodings and joint decoding of individual sequences using finite-state machines. The achievable rate region is characterized in terms of the Lempel-Ziv (LZ) complexities, the conditional LZ complexities and the joint LZ complexity of the two source sequences. An important feature that is needed to this end, which may be interesting on its own right, is a certain asymptotic form of a chain rule for LZ complexities, which we establish in this work. The main emphasis in the achievability scheme is on the universal decoder and its properties. We then show that the achievable rate region is universally attainable by a modified version of Draper's universal incremental Slepian-Wolf (SW) coding scheme, provided that there exists a low-rate reliable feedback link.

关键词： Encoding Complexity theory Decoding Entropy Vectors Automata source coding Hamming distances Viterbi algorithm Urban areas Slepian-Wolf coding Lempel-Ziv algorithm Lempel-Ziv complexity finite-state machines universal decoding

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：