Code language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of la...
详细信息
Code language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions of parameters (e.g., GPT-4). However, as models grow in scale, sustainability concerns emerge, as they are extremely resource-intensive, highlighting the need for efficient, environmentally conscious solutions. GreenAI techniques, such as QLoRA (Quantized Low-Rank Adaptation), offer a promising path for dealing with large models’ sustainability as they enable resource-efficient model fine-tuning. Previous research has shown the effectiveness of QLoRA in code-related tasks, particularly those involving natural language inputs and code as the target output (NL-to-Code), such as code generation. However, no studies have explored its application to tasks that are fundamentally similar to NL-to-Code (natural language to code) but operate in the opposite direction, such as code summarization. This leaves a gap in understanding how well QLoRA can generalize to Code-to-NL tasks, which are equally important for supporting developers in understanding and maintaining code. To address this gap, we investigate the extent to which QLoRA’s capabilities in NL-to-Code tasks can be leveraged and transferred to code summarization, one representative Code-to-NL task. Our study evaluates two state-of-the-art CLMs (CodeLlama and DeepSeek-Coder) across two programminglanguages: Python and java. Each model was tasked with generating a meaningful description for Python and java code methods. The findings of our research confirm previous patterns that emerged when applying QLoRA to source code generation. Notably, we observe that QLoRA not only allows efficient fine-tuning of CLMs for code summarization but also achieves the best results with minimal parameter adjustment compared to full model fine-tuning, which requires expensive
Trusted Execution Environments (TEEs) isolate a special space within a device’s memory that is not accessible to the normal world (also known as Untrusted Environment), even when the device is compromised. Thus, deve...
详细信息
Large language models (LLMs) have transformed code generation. However, most existing approaches focus on mainstream languages such as Python and java, neglecting the Solidity language, the predominant programming lan...
详细信息
This short report presents the 2025 edition of the java Unit Testing Competition in which four test generation tools (EVOFUZZ, EVOSUITE, BBC, and RANDOOP) were bench-marked on a freshly selected set of 55 java classes...
详细信息
The article analyzes the main security functions implemented in the JDK in the context of access control mechanisms. Due to more frequent updates introduced in the java platform, the problem of ensuring backward compa...
详细信息
Inter-app communication is a mandatory and security-critical functionality of operating systems, such as Android. On the application level, Android implements this facility through Intents, which can also transfer non...
详细信息
Two characteristic run-time communication libraries of HPjava are developed as an application level library and device level library. A high-level communication API, Adlib, is developed as an application level communi...
详细信息
ISBN:
(纸本)088986392X
Two characteristic run-time communication libraries of HPjava are developed as an application level library and device level library. A high-level communication API, Adlib, is developed as an application level communication library. This communication library supports collective operations on distributed arrays. The mpjdev API is a device level underlying communication library for HPjava. This library is developed to perform actual communication between processes. The paper describes the novel issues in the implementation of device level library on different platforms, and gives comprehensive benchmark results on a parallel platform. All software developed in this project is available for free download from ***.
Context: Developers leverage java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application ...
详细信息
Context: Developers leverage java annotations to implement functions such as creating objects and operating databases. However, mastering annotations is challenging, and misused annotations might cause an application to crash. Although state-of-the-art techniques attempt to solve this problem, they do not provide conclusions on java annotation misuse types, nor do they leverage project-level information, which results in low efficiency in detecting annotation misuses. Objective: To summarize java annotation misuse types and provide a more efficient method for detecting misused annotations. Method: Firstly, to categorize java annotation misuses, we conduct an empirical study and curate 321 annotation misuse questions from Stack Overflow. Secondly, to better detect these misuses, we propose a BERT-based method, BERT4Anno, which takes project structure and resource configuration into account—factors often neglected by state-of-the-art methods. In BERT4Anno, a novel Annotation Usage Project Representation (AUPR) technique is designed to leverage the information of the interconnections among source code, configuration and project structure. Moreover, an AUPR-based Named Entity Recognition (ANER) task by fine-tuning BERT is devised to learn annotation usage knowledge. With the knowledge, the fine-tuned model can detect misused annotations. Finally, to evaluate our proposed method, two datasets, mainly curated from GitHub and comprising 404 java projects/files with annotation misuse instances, are used for the experiments. Results: The java annotation misuses are categorized into 14 types based on how the curated questions violate the correct annotation usage knowledge. The comparison experiment demonstrates the superior performance of our method over state-of-the-art baselines in terms of precision, recall, and F1 score, while our visualization technique provides insightful interpretations of the mechanism underlying the model's outperformance. Conclusion: By leveraging t
A Software Bill of Materials (SBOM) is becoming an essential tool for effective software dependency management. An SBOM is a list of components used in software, including details such as component names, versions, an...
详细信息
Lisp programmers have long used macros to extend their language. Indeed, their success has inspired macro notations for a variety of other languages, such as C and java. There is, however, a paucity of effective pedag...
详细信息
Lisp programmers have long used macros to extend their language. Indeed, their success has inspired macro notations for a variety of other languages, such as C and java. There is, however, a paucity of effective pedagogic examples of macro use. This paper presents a short, non-trivial example that implements a construct not already found in mainstream languages. Furthermore, it motivates the need for tail-calls, as opposed to mere tail-recursion, and illustrates how support for tail-call optimization is crucial to support a natural style of macro-based language extension.
暂无评论