Data contamination presents a critical barrier preventing widespread industrial adoption of advanced software engineering techniques that leverage code language models (CLMs). This phenomenon occurs when evaluation da...
详细信息
Data contamination presents a critical barrier preventing widespread industrial adoption of advanced software engineering techniques that leverage code language models (CLMs). This phenomenon occurs when evaluation data inadvertently overlaps with the public code repositories used to train CLMs, severely undermining the credibility of performance evaluations. For software companies considering the integration of CLMbased techniques into their development pipeline, this uncertainty about true performance metrics poses an unacceptable business risk. Code refactoring, which comprises code restructuring and variable renaming, has emerged as a promising measure to mitigate data contamination. It provides a practical alternative to the resource-intensive process of building contamination-free evaluation datasets, which would require companies to collect, clean, and label code created after the CLMs' training cutoff dates. However, the lack of automated code refactoring tools and scientifically validated refactoring techniques has hampered widespread industrial implementation. To bridge the gap, this paper presents the first systematic study to examine the efficacy of code refactoring operators at multiple scales (method-level, class-level, and cross-class level) and in different programminglanguages. In particular, we develop an open-sourced toolkit, CODECLEANER, which includes 11 operators for Python, with nine method-level, one class-level, and one cross-class level operator. We elaborate on the rationale for why these operators could work to resolve data contamination and use both data-wise (e.g., N-gram matching overlap ratio) and model-wise metrics (e.g., perplexity) to quantify the efficacy after operators are applied. A drop of 65% overlap ratio is found when applying all operators in CODECLEANER, demonstrating their effectiveness in addressing data contamination. Additionally, we migrate four operators to java, showing their generalizability to another language.
A program’s exceptional behavior can substantially complicate its control flow, and hence accurately reasoning about the program’s correctness. On the other hand, formally verifying realistic programs is likely to i...
详细信息
This paper aims to evaluate GitHub Copilot’s generated code quality based on the LeetCode problem set using a custom automated framework. We evaluate the results of Copilot for 4 programminglanguages: java, C++, Pyt...
详细信息
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning con...
详细信息
Context: Code annotations have gained widespread popularity in programminglanguages, offering developers the ability to attach metadata to code elements to define custom behaviors. Many modern frameworks and APIs use...
详细信息
[Background:] Security-sensitive APIs provide access to security-sensitive resources, e.g., the filesystem or network resources. Including such API calls—directly or through dependencies—increases the application’s...
详细信息
[Background:] Security-sensitive APIs provide access to security-sensitive resources, e.g., the filesystem or network resources. Including such API calls—directly or through dependencies—increases the application’s attack surface. An example of such a phenomenon is Log4Shell, which rendered many applications vulnerable due to network-related capabilities (JNDI lookup) in log4j package. Before the Log4Shell incident, alternate logging libraries to log4j were available that do not make JNDI lookup calls. [Problem:] The impact of such an incident would be minimal if information about network-related API calls by logging libraries were available to the developers. And so the lack of visibility into the calls to these security-sensitive APIs by functionally similar open-source packages makes it difficult for developers to use them as a dependency selection criterion. [Goal:] The goal of this study is to aid developers in selecting their dependency by understanding security-sensitive APIs in their dependency through call graph analysis. [Methodology:] We conducted a mixed-methods study with 45 java packages and defined a list of 219 security-sensitive APIs. We categorized these 219 APIs into 3 themes and 15 categories. We then used call graph analysis to analyze the prevalence of these APIs in our selected package versions, with and without their dependencies. Finally, we conducted a survey with open-source developers (110 respondents) showing the comparison of functionally similar packages w.r.t. security-sensitive API calls to understand the usefulness of this API information in the dependency selection process. [Result:] The number of security-sensitive API calls of functionally similar packages can vary from 0 to 368 in one API category and 0 to 429 in total. Our survey results show that 73% developers agree that information about the number and type of security-sensitive API calls of functionally similar packages would have been useful in their dependency selection.
Automated test techniques usually generate unit tests with higher code coverage than manual tests. However, the readability of automated tests is crucial for code comprehension and maintenance. The readability of unit...
详细信息
In molecular dynamics (MD), systems are molecules made up of atoms, and the aim is to determine their evolution over time. MD is based on a numerical resolution algorithm, whose role is to apply the forces generated b...
详细信息
In molecular dynamics (MD), systems are molecules made up of atoms, and the aim is to determine their evolution over time. MD is based on a numerical resolution algorithm, whose role is to apply the forces generated by the various components, according to the equations of Newtonian physics. Molecular Dynamics is currently mainly used in materials science and molecular biology. In this document, we limit ourselves to alkanes which are non-cyclic carbon-hydrogenated chains. In the basic "All-atom" (AA) scale, all the atoms are directly simulated. In the "United-atom" (UA) scale, one considers grains that are composed of a carbon atom with the hydrogen atoms attached to it. Grains in the "Coarse-grained" (CG) scale are composed of two consecutive UA grains. In the multi-scale approach, one tries to use as much as possible the UA and CG scales which can be more efficiently simulated than the AA scale. In this document, we mainly put the focus on three topics. First, we describe an MD system, implemented in the java programming language, according to the Synchronous Reactive programming approach in which there exists a notion of a global logical time. This system is used to simulate molecules and also to build the potentials functions at the UA and CG scales. Second, two methods to derive UA and CG potentials from AA potentials are proposed and analysed. Basically, both methods rely on strong geometrical links with the AA scale. We use these links with AA to determine the forms and values of the UA and CG potentials. In the first method (called "inverse-Boltzmann"), one considers data produced during several AA scale molecule simulations, and one processes these data using a statistical approach. In the second method ("minimisation method"), one applies a constrained-minimisation technique to AA molecules. The most satisfactory method clearly appears to be the minimisation-based one. The UA potentials we have determined have standard forms: they only differ from AA poten
Dependency updates often cause compilation errors when new dependency versions introduce changes that are incompatible with existing client code. Fixing breaking dependency updates is notoriously hard, as their root c...
详细信息
java applications include third-party dependencies as bytecode. To keep these applications secure, researchers have proposed tools to re-identify dependencies that contain known vulnerabilities. Yet, to allow such re-...
详细信息
暂无评论