Data transformation is a laborious and time-consuming task for analysts. programming by example (PBE) is a technique that can simplify this difficult task for data analysts by automatically generating programs for dat...
详细信息
Data transformation is a laborious and time-consuming task for analysts. programming by example (PBE) is a technique that can simplify this difficult task for data analysts by automatically generating programs for data transformation. Most of the previously proposed PBE methods are based on search algorithms, but recent improvements in machine learning (ML) have led to its application in PBE research. For example, RobustFill was proposed as an ML-based PBE method for string transformation by using long short-term memory (LSTM) as the sequential encoder-decoder model. However, an ML-based PBE method has not been developed for tabular transformations, which are used frequently in data analysis. Thus, in the present study, we propose an ML-based PBE method for tabular transformations. First, we consider the features of tabular transformations, which are more complex and data intensive than string transformations, and propose a new ML-based PBE method using the state-of-the-art Transformer sequential encoder-decoder model. To our knowledge, this is the first ML-based PBE method for tabular transformations. We also propose two decoding methods comprising multistep beam search and program validation-beam search, which are optimized for program generation, and thus generate correct programs with higher accuracy. Our evaluation results demonstrated that the Transformer-based PBE model performed much better than LSTM-based PBE when applied to tabular transformations. Furthermore, the Transformer-based model with the proposed decoding method performed better than the conventional PBE model using the search-based method.
We propose a technique for synthesizing bidirectional programs from the corresponding unidirectional code plus input/output examples. The core ideas are: (1) constructing a sketch using the given unidirectional progra...
详细信息
We propose a technique for synthesizing bidirectional programs from the corresponding unidirectional code plus input/output examples. The core ideas are: (1) constructing a sketch using the given unidirectional program as a specification, and (2) filling the sketch in a modular fashion by exploiting the properties of bidirectional programs. These ideas are enabled by our choice of programming language, HOBiT, which is specifically designed to maintain the unidirectional program structure in bidirectional programming, and keep the parts that control bidirectional behavior modular. To evaluate our approach, we implemented it in a tool called Synbit and used it to generate bidirectional programs for intricate microbenchmarks, as well as for a few larger, more realistic problems. We also compared Synbit to a state-of-the-art unidirectional synthesis tool on the task of synthesizing backward computations. This is an extended version of the paper "Synbit: Synthesizing Bidirectional Programs using Unidirectional Sketches", published at OOPSLA 2021. In addition to the OOPSLA'21 paper, this journal will contain additional formalization and detailed examples.
Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involv...
详细信息
ISBN:
(纸本)9781665457019
Because of the naturalness of software and the rapid evolution of Machine Learning (ML) techniques, frequently repeated code change patterns (CPATs) occur often. They range from simple API migrations to changes involving several complex control structures such as for loops. While manually performing CPATs is tedious, the current state-of-the-art techniques for inferring transformation rules are not advanced enough to handle unseen variants of complex CPATs, resulting in a low recall rate. In this paper we present a novel, automated workflow that mines CPATs, infers the transformation rules, and then transplants them automatically to new target sites. We designed, implemented, evaluated and released this in a tool, PYEVOLVE. At its core is a novel data-flow, control-flow aware transformation rule inference engine. Our technique allows us to advance the state-of-the-art for transformation-by-example tools;without it, 70% of the code changes that PYEVOLVE transforms would not be possible to automate. Our thorough empirical evaluation of over 40,000 transformations shows 97% precision and 94% recall. By accepting 90% of CPATs generated by PYEVOLVE in famous open-source projects, developers confirmed its changes are useful.
programming by example (PBE) is a technology that makes data transformation tasks, especially tabular data transformation, easier for data analysts by automatically generating transformation programs from user-given i...
详细信息
ISBN:
(纸本)9783031159374;9783031159367
programming by example (PBE) is a technology that makes data transformation tasks, especially tabular data transformation, easier for data analysts by automatically generating transformation programs from user-given input-output examples. In recent years, PBE research using machine learning (ML) has emerged because of the recent success of ML in various research fields. We developed an ML-based PBE system for tabular data transformation in previous work. The system is based on the Transformer model that fits sequential data and not two-dimensional structured data like tabular data. Inspired by recent work applying a Transformer model to tasks using two-dimensional data, such as the query answering task for tables or the image computer vision task, we propose a Transformer-based model with positional encoding for two-dimensional tabular data, called tabular positional encoding, to improve the Transformer-based model developed in our previous work. We implemented our proposed model and conducted various experiments. The experimental results show that the Transformer-based model with tabular positional encoding achieves much higher performance than our previous work.
Use of third-party libraries is extremely common in application software. The libraries evolve to accommodate new features or mitigate security vulnerabilities, thereby breaking the Application programming Interface (...
详细信息
Use of third-party libraries is extremely common in application software. The libraries evolve to accommodate new features or mitigate security vulnerabilities, thereby breaking the Application programming Interface (API) used by the software. Such breaking changes in the libraries may discourage client code from using the new library versions thereby keeping the application vulnerable and not up-to-date. We propose a novel output-oriented program synthesis algorithm to automate API usage adaptations via program transformation. Our aim is not only to rely on the few example human adaptations of the clients from the old library version to the new library version, since this can lead to over-fitting transformation rules. Instead, we also rely on example usages of the new updated library in clients, which provide valuable context for synthesizing and applying the transformation rules. Our tool APIFIX provides an automated mechanism to transform application code using the old library versions to code using the new library versions - thereby achieving automated API usage adaptation to fix the effect of breaking changes. Our evaluation shows that the transformation rules inferred by APIFIX achieve 98.7% precision and 91.5% recall. By comparing our approach to state-of-the-art program synthesis approaches, we show that our approach significantly reduces over-fitting while synthesizing transformation rules for API usage adaptations.
The generalizability of PBE solvers is the key to the empirical synthesis performance. Despite the importance of generalizability, related studies on PBE solvers are still limited. In theory, few existing solvers prov...
详细信息
The generalizability of PBE solvers is the key to the empirical synthesis performance. Despite the importance of generalizability, related studies on PBE solvers are still limited. In theory, few existing solvers provide theoretical guarantees on generalizability, and in practice, there is a lack of PBE solvers with satisfactory generalizability on important domains such as conditional linear integer arithmetic (CLIA). In this paper, we adopt a concept from the computational learning theory, Occam learning, and perform a comprehensive study on the framework of synthesis through unification (STUN), a state-of-the-art framework for synthesizing programs with nested if-then-else operators. We prove that Eusolver, a state-of-the-art STUN solver, does not satisfy the condition of Occam learning, and then we design a novel STUN solver, PolyGen, of which the generalizability is theoretically guaranteed by Occam learning. We evaluate PolyGen on the domains of CLIA and demonstrate that PolyGen significantly outperforms two state-of-the-art PBE solvers on CLIA, Eusolver and Euphony, on both generalizability and efficiency.
We present an effective method for scalable and general-purpose inductive program synthesis. There have been two main approaches for inductive synthesis: enumerative search, which repeatedly enumerates possible candid...
详细信息
We present an effective method for scalable and general-purpose inductive program synthesis. There have been two main approaches for inductive synthesis: enumerative search, which repeatedly enumerates possible candidate programs, and the top-down propagation (TDP), which recursively decomposes a given large synthesis problem into smaller subproblems. Enumerative search is generally applicable but limited in scalability, and the TDP is efficient but only works for special grammars or applications. In this paper, we synergistically combine the two approaches. We generate small program subexpressions via enumerative search and put them together into the desired program by using the TDP. Enumerative search enables to bring the power of TDP into arbitrary grammars, and the TDP helps to overcome the limited scalability of enumerative search. We apply our approach to a standard formulation, syntax-guided synthesis (SyGuS), thereby supporting a broad class of inductive synthesis problems. We have implemented our approach in a tool called Duet and evaluate it on SyGuS benchmark problems from various domains. We show that Duet achieves significant performance gains over existing general-purpose as well as domain-specific synthesizers.
Program synthesis tasks are commonly specified via input-output examples. Existing enumerative techniques for such tasks are primarily guided by program syntax and only make indirect use of the examples. We identify a...
详细信息
ISBN:
(纸本)9781450383912
Program synthesis tasks are commonly specified via input-output examples. Existing enumerative techniques for such tasks are primarily guided by program syntax and only make indirect use of the examples. We identify a class of synthesis algorithms for programming-by-examples, which we call example-Guided Synthesis (EGS), that exploits latent structure in the provided examples while generating candidate programs. We present an instance of EGS for the synthesis of relational queries and evaluate it on 86 tasks from three application domains: knowledge discovery, program analysis, and database querying. Our evaluation shows that EGS outperforms state-of-the-art synthesizers based on enumerative search, constraint solving, and hybrid techniques in terms of synthesis time, quality of synthesized programs, and ability to prove unrealizability.
Since regular expressions (abbrev. regexes) are difficult to understand and compose. automatically generating regexes has been an important research problem. This paper introduces TRANSREGEX, for automatically constru...
详细信息
ISBN:
(纸本)9780738113197
Since regular expressions (abbrev. regexes) are difficult to understand and compose. automatically generating regexes has been an important research problem. This paper introduces TRANSREGEX, for automatically constructing regexes from both natural language descriptions and examples. To the best of our knowledge, TRANSREGEX is the first to treat the NI.P-and-example-based regex synthesis problem as the problem of NI.P-based synthesis with regex repair. For this purpose. we present novel algorithms for both NI.P-based synthesis and regex repair. We evaluate TRANSREGEX with ten relevant state-of-the-art tools on three publicly available datasets. The evaluation results demonstrate that the accuracy of our TRANSREGEX is 17.444, 35.8% and 38.9% higher than that of NI.P-based approaches on the three datasets, respectively. Furthermore, TRANSREGEX can achieve higher accuracy than the state-of-the-art multi-modal techniques with 10% to 39% higher accuracy on all three datasets. The evaluation results also indicate TRANSREGEX utilizing natural language and examples in a more effective way.
In this paper, we propose a new technique based on program synthesis for extracting information from webpages. Given a natural language query and a few labeled webpages, our method synthesizes a program that can be us...
详细信息
ISBN:
(纸本)9781450383912
In this paper, we propose a new technique based on program synthesis for extracting information from webpages. Given a natural language query and a few labeled webpages, our method synthesizes a program that can be used to extract similar types of information from other unlabeled webpages. To handle websites with diverse structure, our approach employs a neurosymbolic DSL that incorporates both neural NLP models as well as standard language constructs for tree navigation and string manipulation. We also propose an optimal synthesis algorithm that generates all DSL programs that achieve optimal F-1 score on the training examples. Our synthesis technique is compositional, prunes the search space by exploiting a monotonicity property of the DSL, and uses transductive learning to select programs with good generalization power. We have implemented these ideas in a new tool called WEBQA and evaluate it on 25 different tasks across multiple domains. Our experiments show that WEBQA significantly outperforms existing tools such as state-of-the-art question answering models and wrapper induction systems.
暂无评论