作者:
Wu, JiarongWei, LiliJiang, YanyanCheung, Shing-ChiRen, LuyaoXu, ChangHong Kong Univ Sci & Technol
Dept Comp Sci & Engn Kowloon Clear Water Bay Hong Kong Peoples R China McGill Univ
Dept Elect & Comp Engn McConnell Engn Bldg3480 Univ St Montreal PQ H3A 0E9 Canada Nanjing Univ
State Key Lab Novel Software Technol Xianlin Campus163 Xianlin Rd Nanjing 210023 Jiangsu Peoples R China Nanjing Univ
Dept Comp Sci & Technol Xianlin Campus163 Xianlin Rd Nanjing 210023 Jiangsu Peoples R China Peking Univ
Key Lab High Confidence Software Technol Minist EducDept Comp Sci & TechnolEECS 5 Yiheyuan Rd Beijing 100871 Peoples R China
programming by example (PBE) is an emerging programming paradigm that automatically synthesizes programs specified by user-provided input-output examples. Despite the convenience for end-users, implementing PBE tools ...
详细信息
programming by example (PBE) is an emerging programming paradigm that automatically synthesizes programs specified by user-provided input-output examples. Despite the convenience for end-users, implementing PBE tools often requires strong expertise in programming language and synthesis algorithms. Such a level of knowledge is uncommon among software developers. It greatly limits the broad adoption of PBE by the industry. To facilitate the adoption of PBE techniques, we propose a PBE framework called BEE, which leverages an "entity-action" model based on relational tables to ease PBE development for a wide but restrained range of domains. Implementing PBE tools with Bee only requires adapting domain-specific data entities and user actions to tables, with no need to design a domain-specific language or an efficient synthesis algorithm. The synthesis algorithm of Bee exploits bidirectional searching and constraint-solving techniques to address the challenge of value computation nested in table transformation. We evaluated Bee's effectiveness on 64 PBE tasks from three different domains and usability with a human study of 12 participants. Evaluation results show that Bee is easier to learn and use than the state-of-the-art PBE framework, and the bidirectional algorithm achieves comparable performance to domain-specifically optimized synthesizers.
It is argued that "human-centredness" will be an important characteristic of systems that learn tasks from human users, as the difficulties in inductive inference rule out learning without human assistance. ...
详细信息
programming-by-examples (PBE) involves synthesizing an intended program from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-...
详细信息
programming-by-examples (PBE) involves synthesizing an intended program from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with effectively-invertible (EI) operators at the top and effectively-enumerable (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths. Our first key idea is that of cut functions that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis;however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and effcient for deep DSLs. Our second key idea is that of guarded DSLs that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs. Using cuts and gDSLs, we have built FlashFill(++), an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill(++) gDSL
The ability to learn programs from few examples is a powerful technology with disruptive applications in many domains, as it allows users to automate repetitive tasks in an intuitive way. Existing frameworks on induct...
详细信息
The ability to learn programs from few examples is a powerful technology with disruptive applications in many domains, as it allows users to automate repetitive tasks in an intuitive way. Existing frameworks on inductive synthesis only perform syntactic manipulations, where they rely on the syntactic structure of the given examples and not their meaning. Any semantic manipulations, such as transforming dates, have to be manually encoded by the designer of the inductive programming framework. Recent advances in large language models have shown these models to be very adept at performing semantic transformations of its input by simply providing a few examples of the task at hand. When it comes to syntactic transformations, however, these models are limited in their expressive power. In this paper, we propose a novel framework for integrating inductive synthesis with few-shot learning language models to combine the strength of these two popular technologies. In particular, the inductive synthesis is tasked with breaking down the problem in smaller subproblems, among which those that cannot be solved syntactically are passed to the language model. We formalize three semantic operators that can be integrated with inductive synthesizers. To minimize invoking expensive semantic operators during learning, we introduce a novel deferred query execution algorithm that considers the operators to be oracles during learning. We evaluate our approach in the domain of string transformations: the combination methodology can automate tasks that cannot be handled using either technologies by themselves. Finally, we demonstrate the generality of our approach via a case study in the domain of string profiling.
A major concern with integrating large language model (LLM) services (e.g., ChatGPT) into workplaces is that employees may inadvertently leak sensitive information through their prompts. Since user prompts can involve...
详细信息
A major concern with integrating large language model (LLM) services (e.g., ChatGPT) into workplaces is that employees may inadvertently leak sensitive information through their prompts. Since user prompts can involve arbitrary vocabularies, conventional data leak mitigation solutions, such as string-matching-based filtering, often fall short. We present GPTWall, a privacy firewall that helps internal admins create and manage policies to mitigate data leaks in prompts sent to external LLM services. GPTWall's key innovations are (1) introducing a lightweight LLM running on the edge to obfuscate target information in prompts and restore the information after receiving responses, and (2) helping admins author fine-grained disclosure policies through programming by example. We evaluated GPTWall with 12 participants and found that they could create an average of 17.7 policies within 30 minutes, achieving an increase of 29% in precision and 22% in recall over the state-of-the-art data de-identification tool.
programming by example (PBE) is an important subproblem of program synthesis, and PBE techniques have been applied to many domains. Though many techniques for accelerating PBE systems have been explored, the scalabili...
详细信息
programming by example (PBE) is an important subproblem of program synthesis, and PBE techniques have been applied to many domains. Though many techniques for accelerating PBE systems have been explored, the scalability remains one of the main challenges: There is still a gap between the performances of state-of-the-art synthesizers and the industrial requirement. To further speed up solving PBE tasks, in this paper, we propose a novel PBE framework MaxPlash. MaxPlash uses a model based on structural probability, named topdown prediction models, to guide a search based on dynamic programming, such that the search will focus on subproblems that form probable programs, and avoid improbable programs. Our evaluation shows that MaxFlash achieves x4.107 - x2080 speed-ups against state-of-the-art solvers on 244 real-world tasks.
programming by examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples...
详细信息
ISBN:
(纸本)9781450337793
programming by examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though often easy to provide, are an ambiguous specification of the user's intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system. We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program. Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users' confidence in the result.
As of today, programming has never been so accessible. Yet, it remains a challenge for end-users: students, non-technical employees, experts in their domains outside of computer science, and so on. With its forecast p...
详细信息
As of today, programming has never been so accessible. Yet, it remains a challenge for end-users: students, non-technical employees, experts in their domains outside of computer science, and so on. With its forecast potential for solving problems by only observing inputs and outputs, programming-by-example was supposed to alleviate complex tasks requiring programming for end-users. The initial ideas of macro-based editors paved the way to subsequent practical solutions, such as spreadsheet transformations from examples. Finding the right program is the core of the programming-by-example systems. However, users find it difficult to trust such generated programs. In this thesis, we contribute to proving that some forms of interaction alleviate, by having users provide examples, the problem of finding correct and reliable programs. We first report on two experiments that enable us to conjecture what kind of interaction brings benefits to programming-by-example. First, we present a new kind of game engine, Pong Designer. In this game engine, by using their finger, users program rules on the fly, by modifying the game state. We analyze its potential, and its eventual downsides that have probably prevented its wide adoption. Second, we present StriSynth, an interactive command-line tool that uses programming-by-example to transform string and collections. The resulting programs can also rename or otherwise manage files. We obtained the result that confirms that many users preferred StriSynth over usual programming languages, but would appreciate to have both. We then report on two new exciting experiments with verified results, using two forms of interaction truly benefiting programming-by-example. Third, on top of a programming- by-example-based engine for extracting structured data out of text files, in this thesis we study two interaction models implemented in a tool named FlashProg: a view of the program with notification about ambiguities, and the asking of clarifica
Live programming is a paradigm in which values from program execution are shown to the programmer through continual feedback. programming by example is a paradigm in which code is synthesized from example values showi...
详细信息
ISBN:
(纸本)9781450391122
Live programming is a paradigm in which values from program execution are shown to the programmer through continual feedback. programming by example is a paradigm in which code is synthesized from example values showing a desired behavior. This talk presents some of our recent research that combines these two paradigms in beneficial ways. I will walk through our ideas, explain our contributions, discuss what we learned and finally provide thoughts for the future.
This paper describes a report tool in which report formats are designated by “programming by example”-like operations. Users specify a sample layout of an example row of relational table data on a sheet, and select ...
详细信息
ISBN:
(纸本)9781581131345
This paper describes a report tool in which report formats are designated by “programming by example”-like operations. Users specify a sample layout of an example row of relational table data on a sheet, and select an iteration pattern of the sample layout. The tool extracts a set of general formatting rules from the sample layout. The rules consist of absolute positions of non-iterative data, relative positions of iterative data, the iteration pattern, and the increment of the iteration. The tool interprets the rules and generates new reports of the format for different table data.
暂无评论