Model transformation (MT) has become an important concern in software engineering. In addition to its role in model-driven development, it is useful in many other situations such as measurement, refactoring, and test-...
详细信息
Model transformation (MT) has become an important concern in software engineering. In addition to its role in model-driven development, it is useful in many other situations such as measurement, refactoring, and test-case generation. Roughly speaking, MT aims to derive a target model from a source model by following some rules or principles. So far, the contributions in MT have mostly relied on defining languages to express transformation rules. However, the task of defining, expressing, and maintaining these rules can be difficult, especially for proprietary and non-widely used formalisms. In some situations, companies have accumulated examples from past experiences. Our work starts from these observations to view the transformation problem as one to solve with fragmentary knowledge, i.e. with only examples of source-to-target MTs. Our approach has two main advantages: (1) it always proposes a transformation for a source model, even when rule induction is impossible or difficult to achieve;(2) it is independent from the source and target formalisms;aside from the examples, no extra information is needed. In this context, we propose an optimization-based approach that consists of finding in the examples combinations of transformation fragments that best cover the source model. To that end, we use two strategies based on two search-based algorithms: particle swarm optimization and simulated annealing. The results of validating our approach on industrial projects show that the obtained models are accurate.
Software developers often repeat the same code changes within a project or across different projects. These repetitive changes are known as “code change patterns” (CPATs). Automating CPATs is crucial to expedite the...
详细信息
Software developers often repeat the same code changes within a project or across different projects. These repetitive changes are known as “code change patterns” (CPATs). Automating CPATs is crucial to expedite the software development process. While current transformation by example (TBE) techniques can automate CPATs, they are limited by the quality and quantity of the provided input examples. Thus, they miss transforming code variations that do not have the exact syntax, data-, or control-flow of the provided input examples, despite being semantically similar. Large Language Models (LLMs), pre-trained on extensive source code datasets, offer a potential solution. Harnessing the capability of LLMs to generate semantically equivalent, yet previously unseen variants of the original CPAT could significantly increase the effectiveness of TBE systems. In this paper, we first discover best practices for harnessing LLMs to generate code variants that meet three criteria: correctness (semantic equivalence to the original CPAT), usefulness (reflecting what developers typically write), and applicability (aligning with the primary intent of the original CPAT). We then implement these practices in our tool PyCraft, which synergistically combines static code analysis, dynamic analysis, and LLM capabilities. By employing chain-of-thought reasoning, PyCraft generates variations of input examples and comprehensive test cases that identify correct variations with an F-measure of 96.6%. Our algorithm uses feedback iteration to expand the original input examples by an average factor of 58x. Using these richly generated examples, we inferred transformation rules and then automated these changes, resulting in an increase of up to 39x, with an average increase of 14x in target codes compared to a previous state-of-the-art tool that relies solely on static analysis. We submitted patches generated by PyCraft to a range of projects, notably esteemed ones like microsoft/DeepSpeed and IBM/
Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involv...
详细信息
ISBN:
(纸本)9781665457019
Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involving several complex control structures such as for loops. While manually performing CPATs is tedious, the current state-of-the-art techniques for inferring transformation rules are not advanced enough to handle unseen variants of complex CPATs, resulting in a low recall rate. In this paper we present a novel, automated workflow that mines CPATs, infers the transformation rules, and then transplants them automatically to new target sites. We designed, implemented, evaluated and released this in a tool, PYEVOLVE. At its core is a novel data-flow, control-flow aware transformation rule inference engine. Our technique allows us to advance the state-of-the-art for transformation-by-example tools;without it, 70% of the code changes that PYEVOLVE transforms would not be possible to automate. Our thorough empirical evaluation of over 40,000 transformations shows 97% precision and 94% recall. By accepting 90% of CPATs generated by PYEVOLVE in famous open-source projects, developers confirmed its changes are useful.
暂无评论