A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct...
详细信息
ISBN:
(纸本)9781450351058
A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration-based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3's repair effectiveness with state-of-the- art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs.
A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlyin...
详细信息
A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work, we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior. We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators.
Synthesizing intended programs from user-specified input-output examples, also known as programming by examples (PBE), is a challenging problem in program synthesis, and has been applied to a wide range of domains. A ...
详细信息
Synthesizing intended programs from user-specified input-output examples, also known as programming by examples (PBE), is a challenging problem in program synthesis, and has been applied to a wide range of domains. A key challenge in PBE is to efficiently discover a user-intended program in the search space that can be exponentially large. In this work, we propose a method for automatic synthesis of functional programs on list manipulation, by using offline-trained Recurrent Neural Network (RNN) models to guide the program search. We adopt miniKanren, an embedded domain-specific language for flexible relational programming, as an underlying top-down deductive search engine of candidate programs that are consistent with input-output examples. Our approach targets an easy and effective integration of deep learning techniques in making better PBE systems and combines two technical ideas on generating diverse training dataset and designing rich feature embeddings of probable subproblems for synthesis generated by deductive search. The offline-learned model is then used in PBE to guide the top-down deductive search with specific strategies. To practically manipulate data structures of lists, our method synthesizes functional programs with popular higher-order combinators including map, foldl and foldr. We have implemented our method and evaluated it with challenging program synthesis tasks on list manipulation. The experiments show promising results on the performance of our method compared to related state-of-the-art inductive synthesizers.
Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data typ...
详细信息
ISBN:
(纸本)9781450335492
Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user's needs or are beyond their expertise. In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from online help forums and Excel product team.
While adding features, fixing bugs, or refactoring the code, developers may perform repetitive code edits. Although Integrated Development Environments (IDEs) automate some transformations such as renaming, many repet...
详细信息
ISBN:
(纸本)9781450342186
While adding features, fixing bugs, or refactoring the code, developers may perform repetitive code edits. Although Integrated Development Environments (IDEs) automate some transformations such as renaming, many repetitive edits are performed manually, which is error-prone and time-consuming. To help developers to apply these edits, we propose a technique to perform repetitive edits using examples. The technique receives as input the source code before and after the developer edits some target locations of the change and produces as output the top-ranked program transformation that can be applied to edit the remaining target locations in the codebase. The technique uses a state-of-the-art program synthesis methodology and has three main components: a) a DSL for describing program transformations;b) synthesis algorithms to learn program transformations in this DSL;c) ranking algorithms to select the program transformation with the higher probability of performing the desired repetitive edit. In our preliminary evaluation, in a dataset of 59 repetitive edit cases taken from real C# source code repositories, the technique performed, in 83% of the cases, the intended transformation using only 2.8 examples.
Integrated development environments (IDEs) are equipped with code inspections to warn developers of malformed or incorrect code by analyzing the code's data flow. However, the data flow analysis performed by the I...
详细信息
ISBN:
(纸本)9781450394130
Integrated development environments (IDEs) are equipped with code inspections to warn developers of malformed or incorrect code by analyzing the code's data flow. However, the data flow analysis performed by the IDE code inspections may issue warnings in code where warnings are irrelevant. Existing methods to prevent any bogus warnings are either too conservative - suppressing the same type of warnings in the entire codebase - or are not sustainable as they require adding a set of heuristics to the data flow analysis. We propose a programming-by-example (PBE) framework that synthesizes tree automata (TAs) to suppress bogus warnings in all the code containing the same patterns as the user-provided examples. Experiments with a prototype of the framework demonstrate that several real-world patterns heuristically suppressed in production IDE can be mechanically translated to a TA in a systematic way. Furthermore, we briefly discuss how TA-based solution can be useful in other IDE features such as code refactoring.
Various document types that combine model and view (e. g., text files, webpages, spreadsheets) make it easy to organize (possibly hierarchical) data, but make it difficult to extract raw data for any further manipulat...
详细信息
ISBN:
(纸本)9781450327848
Various document types that combine model and view (e. g., text files, webpages, spreadsheets) make it easy to organize (possibly hierarchical) data, but make it difficult to extract raw data for any further manipulation or querying. We present a general framework FlashExtract to extract relevant data from semi-structured documents using examples. It includes: (a) an interaction model that allows end-users to give examples to extract various fields and to relate them in a hierarchical organization using structure and sequence constructs. (b) an inductive synthesis algorithm to synthesize the intended program from few examples in any underlying domain-specific language for data extraction that has been built using our specified algebra of few core operators (map, filter, merge, and pair). We describe instantiation of our framework to three different domains: text files, webpages, and spreadsheets. On our benchmark comprising 75 documents, FlashExtract is able to extract intended data using an average of 2.36 examples in 0.84 seconds per field.
Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effe...
详细信息
ISBN:
(纸本)9781450336895
Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effective industrial-quality inductive synthesizer is an intellectual and engineering challenge, requiring 1-2 man-years of effort. Our novel observation is that many PBE algorithms are a natural fall-out of one generic meta-algorithm and the domain-specific properties of the operators in the underlying domain-specific language (DSL). The meta-algorithm propagates example-based constraints on an expression to its subexpressions by leveraging associated witness functions, which essentially capture the inverse semantics of the underlying operator. This observation enables a novel program synthesis methodology called data-driven domain-specific deduction (D-4), where domain-specific insight, provided by the DSL designer, is separated from the synthesis algorithm. Our FlashMeta framework implements this methodology, allowing synthesizer developers to generate an efficient synthesizer from the mere DSL definition (if properties of the DSL operators have been modeled). In our case studies, we found that 10+ existing industrial-quality mass-market applications based on PBE can be cast as instances of D-4. Our evaluation includes reimplementation of some prior works, which in FlashMeta become more efficient, maintainable, and extensible. As a result, FlashMeta-based PBE tools are deployed in several industrial products, including Microsoft PowerShell 3.0 for Windows 10, Azure Operational Management Suite, and Microsoft Cortana digital assistant.
The programming languages (PL) research community has traditionally catered to the needs of professional programmers in the continuously evolving technical industry. However, there is a new opportunity that knocks our...
详细信息
ISBN:
(纸本)9781450333009
The programming languages (PL) research community has traditionally catered to the needs of professional programmers in the continuously evolving technical industry. However, there is a new opportunity that knocks our doors. The recent IT revolution has resulted in the masses having access to personal computing devices. More than 99% of these computer users are non-programmers and are today limited to being passive consumers of the software that is made available to them. Can we empower these users to more effectively leverage computers for their daily tasks? The formalisms, techniques, and tools developed in the PL and the formal methods research communities can play a pivotal role!
LATEX is a widely-used document preparation system. Its powerful ability in mathematical equation editing is perhaps the main reason for its popularity in academia. Sometimes, however, even an expert user may spend mu...
详细信息
ISBN:
(纸本)9783031212123;9783031212130
LATEX is a widely-used document preparation system. Its powerful ability in mathematical equation editing is perhaps the main reason for its popularity in academia. Sometimes, however, even an expert user may spend much time fixing an erroneous equation. In this paper, we present EqFix, a synthesis-based repairing system for LATEX equations. It employs a set of fixing rules and can suggest possible repairs for common errors in LATEX equations. A domain-specific language is proposed for formally expressing the fixing rules. The fixing rules can be automatically synthesized from a set of input-output examples. An extension of relaxers is also introduced to enhance the practicality of EqFix. We evaluate EqFix on real-world examples and find that it can synthesize rules with high generalization ability. Compared with a state-of-the-art string transformation synthesizer, EqFix solved 37% more cases and spent less than half of their synthesis time.
暂无评论