While synthesizing and repairing regular expressions (regexes) based on programming-by-examples (PBE) methods have seen rapid progress in recent years, all existing works only support synthesizing or repairing regexes...
详细信息
While synthesizing and repairing regular expressions (regexes) based on programming-by-examples (PBE) methods have seen rapid progress in recent years, all existing works only support synthesizing or repairing regexes for membership testing, and the support for extraction is still an open problem. This paper fills the void by proposing the first PBE-based method for synthesizing and repairing regexes for extraction. Our work supports regexes that have real-world extensions such as backreferences and lookarounds. The extensions significantly affect the PBE-based synthesis and repair problem. In fact, we show that there are unsolvable instances of the problem if the synthesized regexes are not allowed to use the extensions, i.e., there is no regex without the extensions that correctly classify the given set of examples, whereas every problem instance is solvable if the extensions are allowed. This is in stark contrast to the case for the membership where every instance is guaranteed to have a solution expressible by a pure regex without the extensions. The main contribution of the paper is an algorithm to solve the PBE-based synthesis and repair problem for extraction. Our algorithm builds on existing methods for synthesizing and repairing regexes for membership testing, i.e., the enumerative search algorithms with SMT constraint solving. However, significant extensions are needed because the SMT constraints in the previous works are based on a non-deterministic semantics of regexes. Non-deterministic semantics is sound for membership but not for extraction, because which substrings are extracted depends on the deterministic behavior of actual regex engines. To address the issue, we propose a new SMT constraint generation method that respects the deterministic behavior of regex engines. For this, we first define a novel formal semantics of an actual regex engine as a deterministic big-step operational semantics, and use it as a basis to design the new SMT constraint g
An important function of an agent is to be "on the lookout" for bits of information that are interesting to its user, even if these items appear in the midst of a larger body of unstructured information. But...
详细信息
An important function of an agent is to be "on the lookout" for bits of information that are interesting to its user, even if these items appear in the midst of a larger body of unstructured information. But how to tell these agents which patterns are meaningful and what to do with the result? Especially when agents are used to recognize text, they are usually driven by parsers which require input in the form of textual grammar rules. Editing grammars is difficult and error-prone for end users. Grammar ["Grammars by example"] is the first direct manipulation interface designed to allow nonexpert users to define grammars interactively. The user presents concrete examples of text that he or she would like the agent to recognize. Rules are constructed by an iterative process, where Grammex heuristically parses the example, displays a set of hypotheses, and the user critiques the system's suggestions. Actions to take upon recognition are also demonstrated by example.
We present an effective method for scalable and general-purpose inductive program synthesis. There have been two main approaches for inductive synthesis: enumerative search, which repeatedly enumerates possible candid...
详细信息
We present an effective method for scalable and general-purpose inductive program synthesis. There have been two main approaches for inductive synthesis: enumerative search, which repeatedly enumerates possible candidate programs, and the top-down propagation (TDP), which recursively decomposes a given large synthesis problem into smaller subproblems. Enumerative search is generally applicable but limited in scalability, and the TDP is efficient but only works for special grammars or applications. In this paper, we synergistically combine the two approaches. We generate small program subexpressions via enumerative search and put them together into the desired program by using the TDP. Enumerative search enables to bring the power of TDP into arbitrary grammars, and the TDP helps to overcome the limited scalability of enumerative search. We apply our approach to a standard formulation, syntax-guided synthesis (SyGuS), thereby supporting a broad class of inductive synthesis problems. We have implemented our approach in a tool called Duet and evaluate it on SyGuS benchmark problems from various domains. We show that Duet achieves significant performance gains over existing general-purpose as well as domain-specific synthesizers.
While editing code, it is common for developers to make multiple related repeated edits that are all instances of a more general program transformation. Since this process can be tedious and error-prone, we study the ...
详细信息
While editing code, it is common for developers to make multiple related repeated edits that are all instances of a more general program transformation. Since this process can be tedious and error-prone, we study the problem of automatically learning program transformations from past edits, which can then be used to predict future edits. We take a novel view of the problem as a semi-supervised learning problem: apart from the concrete edits that are instances of the general transformation, the learning procedure also exploits access to additional inputs (program subtrees) that are marked as positive or negative depending on whether the transformation applies on those inputs. We present a procedure to solve the semi-supervised transformation learning problem using anti-unification and programming-by-example synthesis technology. To eliminate reliance on access to marked additional inputs, we generalize the semi-supervised learning procedure to a feedback-driven procedure that also generates the marked additional inputs in an iterative loop. We apply these ideas to build and evaluate three applications that use different mechanisms for generating feedback. Compared to existing tools that learn program transformations from edits, our feedback-driven semi-supervised approach is vastly more effective in successfully predicting edits with significantly lesser amounts of past edit data.
Synthesizing relational queries from data is challenging in the presence of recursion and invented predicates. We propose a fully automated approach to synthesize such queries. Our approach comprises of two steps: it ...
详细信息
Synthesizing relational queries from data is challenging in the presence of recursion and invented predicates. We propose a fully automated approach to synthesize such queries. Our approach comprises of two steps: it first synthesizes a non-recursive query consistent with the given data, and then identifies recursion schemes in it and thereby generalizes to arbitrary data. This generalization is achieved by an iterative predicate unification procedure which exploits the notion of data provenance to accelerate convergence. In each iteration of the procedure, a constraint solver proposes a candidate query, and a query evaluator checks if the proposed program is consistent with the given data. The data provenance for a failed query allows us to construct additional constraints for the constraint solver and refine the search. We have implemented our approach in a tool named MOBIUS. On a suite of 21 challenging recursive query synthesis tasks, MOBIUS outperforms three state-of-the-art baselines GENSYNTH, ILASP, and POPPER, both in terms of runtime and accuracy. We also demonstrate that the synthesized queries generalize well to unseen data.
Prior research has established that long-term interests in programming are often shaped by formative computing experiences, especially those involving programming and graphics. Existing authoring environments for chil...
详细信息
ISBN:
(纸本)9781450343138
Prior research has established that long-term interests in programming are often shaped by formative computing experiences, especially those involving programming and graphics. Existing authoring environments for children (ages 9-14) to make 2D games and animations require them to: (a) create programs, (b) customize templates, or (c) combine rewrite rules with programs. One way to support early experiences in computing for a more diverse set of learners is to simplify such authoring systems, by removing text heavy code and minimizing cognitive load, which can allow separation of coding concepts from writing code. In this paper, we describe an exploratory system we are designing to test this idea, called BlockStudio. Using a programming by example paradigm, children manipulate colored blocks on the screen to specify desired behavior via concrete changes. Based on these inputs, our system synthesizes generalized rules based on color. We give a brief overview of our current prototype, then share insights gleaned from two intergenerational co-design sessions with children and discuss implications for designers of similar systems.
We present a method for synthesizing regular expressions for introductory automata assignments. Given a set of positive and negative examples, the method automatically synthesizes the simplest possible regular express...
详细信息
ISBN:
(纸本)9781450344463
We present a method for synthesizing regular expressions for introductory automata assignments. Given a set of positive and negative examples, the method automatically synthesizes the simplest possible regular expression that accepts all the positive examples while rejecting all the negative examples. The key novelty is the search-based synthesis algorithm that leverages ideas from over- and under-approximations to effectively prune out a large search space. We have implemented our technique in a tool and evaluated it with non-trivial benchmark problems that students often struggle with. The results show that our system can synthesize desired regular expressions in 6.7 seconds on the average, so that it can be interactively used by students to enhance their understanding of regular expressions.
This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their...
详细信息
ISBN:
(纸本)9781450349888
This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.
In this article, we propose an end-user adaptive architecture for movement assessment from RGB videos. Our method allows physiotherapists to add customized exercises for patients from only a few video training example...
详细信息
ISBN:
(纸本)9783030499037;9783030499044
In this article, we propose an end-user adaptive architecture for movement assessment from RGB videos. Our method allows physiotherapists to add customized exercises for patients from only a few video training examples. The main idea is to take leverage of Deep learning-based pose estimation frameworks to track in real-time the key-body joints from the image data. Our system mimics the traditional physical rehabilitation process, where the therapist guides patients through demonstrative examples, and the patients repeat these examples while the physiotherapist monitors their movements. We evaluate our proposed method on four physiotherapeutic exercises for shoulder strengthening. Results indicate that our approach contributes both to reduce physiotherapist time needed to train the system, and to automatically assess the patients' movements without direct monitoring from the physiotherapist.
暂无评论