This paper introduces Silverchain, a fluent API generator developed by the author of this paper. Silverchain accepts rules on the chaining order of API methods and generates type definitions for the fluent API where t...
详细信息
In the face of massive concurrent user access in the era of big data, how to build high-performance web services has become one of the difficulties to be solved by network applications. This paper utilizes *** archite...
详细信息
When building enterprise applications (EAs) on java frameworks (e.g., Spring), developers often configure application components via metadata (i.e., java annotations and XML files). It is challenging for developers to...
详细信息
This study examines the impact of tokenized java code length on the accuracy and explicitness of ten major LLMs in vulnerability detection. Using chi-square tests and known ground truth, we found inconsistencies acros...
详细信息
Datalog engines for fixpoint evaluation have brought great benefits to static program analysis over the past decades. A Datalog specification of an analysis allows a declarative, easy-to-maintain specification, withou...
详细信息
Datalog engines for fixpoint evaluation have brought great benefits to static program analysis over the past decades. A Datalog specification of an analysis allows a declarative, easy-to-maintain specification, without sacrificing performance, and indeed often achieving significant speedups compared to hand-coded algorithms. However, these benefits come with a certain loss of control. Datalog evaluation is bottom-up, meaning that all inferences (from a set of initial facts) are performed and all their conclusions are outputs of the computation. In practice, virtually every program analysis expressed in Datalog becomes unscalable for some inputs, due to the worst-case blowup of computing all results, even when a partial answer would have been perfectly satisfactory. In this work, we present a simple, uniform, and elegant solution to the problem, with stunning practical effectiveness and application to virtually any Datalog-based analysis. The approach consists of leveraging the choice construct, supported natively in modern Datalog engines like Soufflé. The choice construct allows the definition of functional dependencies in a relation and has been used in the past for expressing worklist algorithms. We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates. The technique is applicable to practically any analysis architecture imaginable, since it adaptively prunes evaluation results when a (programmer-controlled) projection of a relation exceeds a desired cardinality. We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for java bytecode) and the main client analyses from the Gigahorse framework (for Ethereum smart contracts). Without needing to understand the existing analysis logic and with minimal, local-only changes, the performance of each framework increases dramatically, by over 20x for the hardest inputs, with near-negligible sacrifice in completene
Log4j has become a widely adopted logging library for java programs due to its long history and high reliability. Its widespread use is notable not only because of its maturity but also due to the complexity and depth...
详细信息
Ensuring code correctness remains a challenging problem even as large language models (LLMs) become increasingly capable at code-related tasks. While LLM-based program repair systems can propose bug fixes using only a...
详细信息
Decompilation, the process of converting machine-level code into readable source code, plays a critical role in reverse engineering. Given that the main purpose of decompilation is to facilitate code comprehension in ...
详细信息
Decompilation, the process of converting machine-level code into readable source code, plays a critical role in reverse engineering. Given that the main purpose of decompilation is to facilitate code comprehension in scenarios where the source code is unavailable, the understandability of decompiled code is of great importance. Unfortunately, previous researches have predominantly concentrated on the correctness of decompilation, leaving the understandability of the decompiled code largely unexplored. Do decompiler stakeholders place importance on the understandability of decompiled code? Are there any methodologies that can be used to assess this understandability? These questions, however, remain unanswered so far. Therefore, in this paper, we propose the first empirical study on the understandability of java decompiled code. This study involves a well-designed user survey to reveal the Severity and Universality of understandability issues in java decompilation, as well as a series of experiments for the understandability comparison between a total of 429 sets of source code files from 14 java projects and corresponding decompiled files provided by 3 famous java decompilers. Through an in-depth analysis of the survey results and the experiment results, we obtained the following findings: (1) Understandability of java decompilation is considered as important as its correctness, and decompilation understandability issues are even more commonly encountered than decompilation failures. (2) A notable percentage of code snippets decompiled by java decompilers exhibit significantly lower or higher levels of understandability in comparison to their original source code. (3) Unfortunately, Cognitive Complexity demonstrates relatively acceptable precision while low recall in recognizing these code snippets exhibiting diverse understandability during decompilation. (4) Even worse, perplexity demonstrates lower levels of precision and recall in recognizing such code snippets.
Code language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of la...
详细信息
Code language Models (CLMs) have demonstrated high effectiveness in automating software engineering tasks such as bug fixing, code generation, and code documentation. This progress has been driven by the scaling of large models, ranging from millions to trillions of parameters (e.g., GPT-4). However, as models grow in scale, sustainability concerns emerge, as they are extremely resource-intensive, highlighting the need for efficient, environmentally conscious solutions. GreenAI techniques, such as QLoRA (Quantized Low-Rank Adaptation), offer a promising path for dealing with large models’ sustainability as they enable resource-efficient model fine-tuning. Previous research has shown the effectiveness of QLoRA in code-related tasks, particularly those involving natural language inputs and code as the target output (NL-to-Code), such as code generation. However, no studies have explored its application to tasks that are fundamentally similar to NL-to-Code (natural language to code) but operate in the opposite direction, such as code summarization. This leaves a gap in understanding how well QLoRA can generalize to Code-to-NL tasks, which are equally important for supporting developers in understanding and maintaining code. To address this gap, we investigate the extent to which QLoRA’s capabilities in NL-to-Code tasks can be leveraged and transferred to code summarization, one representative Code-to-NL task. Our study evaluates two state-of-the-art CLMs (CodeLlama and DeepSeek-Coder) across two programminglanguages: Python and java. Each model was tasked with generating a meaningful description for Python and java code methods. The findings of our research confirm previous patterns that emerged when applying QLoRA to source code generation. Notably, we observe that QLoRA not only allows efficient fine-tuning of CLMs for code summarization but also achieves the best results with minimal parameter adjustment compared to full model fine-tuning, which requires expensive
Trusted Execution Environments (TEEs) isolate a special space within a device’s memory that is not accessible to the normal world (also known as Untrusted Environment), even when the device is compromised. Thus, deve...
详细信息
暂无评论