检索结果-内蒙古大学图书馆

FIDEX: Filtering Spreadsheet Data using examples

ACM SIGPLAN NOTICES 2016年第10期51卷 195-213页

作者： Wang, Xinyu Gulwani, Sumit Singh, Rishabh UT Austin Austin TX 78712 USA Microsoft Res Redmond WA USA

Data filtering in spreadsheets is a common problem faced by millions of end-users. The task of data filtering requires a computational model that can separate intended positive and negative string instances. We present a system, FIDEX, that can efficiently learn desired data filtering expressions from a small set of positive and negative string examples. There are two key ideas of our approach. First, we design an expressive DSL to represent disjunctive filter expressions needed for several real-world data filtering tasks. Second, we develop an efficient synthesis algorithm for incrementally learning consistent filter expressions in the DSL from very few positive and negative examples. A DAG-based data structure is used to succinctly represent a large number of filter expressions, and two corresponding operators are defined for algorithmically handling positive and negative examples, namely, the intersection and subtraction operators. FIDEX is able to learn data filters for 452 out of 460 real-world data filtering tasks in real time (0.22s), using only 2.2 positive string instances and 2:7 negative string instances on average.

关键词： Program Synthesis Data Filtering Regular Expressions programming by examples Algorithms Human Factor

来源：评论

学校读者我要写书评

暂无评论

Transforming Spreadsheet Data Types using examples

引用

ACM SIGPLAN NOTICES 2016年第1期51卷 343-356页

作者： Singh, Rishabh Gulwani, Sumit Microsoft Res Redmond WA USA

Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user's needs or are beyond their expertise. In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from online help forums and Excel product team.

关键词： Algorithms Human Factor Program Synthesis Probabilistic Synthesis Spreadsheet programming programming by examples Noisy examples

来源：评论

学校读者我要写书评

暂无评论

FlashMeta: A Framework for Inductive Program Synthesis 2015

FlashMeta: A Framework for Inductive Program Synthesis

引用

ACM SIGPLAN International Conference on Object-Oriented programming, Systems, Languages, and Applications (OOPSLA)

作者： Polozov, Oleksandr Gulwani, Sumit Univ Washington Seattle WA 98195 USA Microsoft Res Redmond WA USA

ISBN: (纸本)9781450336895

Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effective industrial-quality inductive synthesizer is an intellectual and engineering challenge, requiring 1-2 man-years of effort. Our novel observation is that many PBE algorithms are a natural fall-out of one generic meta-algorithm and the domain-specific properties of the operators in the underlying domain-specific language (DSL). The meta-algorithm propagates example-based constraints on an expression to its subexpressions by leveraging associated witness functions, which essentially capture the inverse semantics of the underlying operator. This observation enables a novel program synthesis methodology called data-driven domain-specific deduction (D-4), where domain-specific insight, provided by the DSL designer, is separated from the synthesis algorithm. Our FlashMeta framework implements this methodology, allowing synthesizer developers to generate an efficient synthesizer from the mere DSL definition (if properties of the DSL operators have been modeled). In our case studies, we found that 10+ existing industrial-quality mass-market applications based on PBE can be cast as instances of D-4. Our evaluation includes reimplementation of some prior works, which in FlashMeta become more efficient, maintainable, and extensible. As a result, FlashMeta-based PBE tools are deployed in several industrial products, including Microsoft PowerShell 3.0 for Windows 10, Azure Operational Management Suite, and Microsoft Cortana digital assistant.

关键词： Algorithms Languages Inductive program synthesis programming by examples frameworks domain-specific languages deductive inference search-based synthesis

来源：评论

学校读者我要写书评

暂无评论

Automating Repetitive Tasks for the Masses 15

Automating Repetitive Tasks for the Masses

引用

42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of programming Languages (POPL 15)

作者： Gulwani, Sumit Microsoft Res Philadelphia PA 19122 USA

ISBN: (纸本)9781450333009

The programming languages (PL) research community has traditionally catered to the needs of professional programmers in the continuously evolving technical industry. However, there is a new opportunity that knocks our doors. The recent IT revolution has resulted in the masses having access to personal computing devices. More than 99% of these computer users are non-programmers and are today limited to being passive consumers of the software that is made available to them. Can we empower these users to more effectively leverage computers for their daily tasks? The formalisms, techniques, and tools developed in the PL and the formal methods research communities can play a pivotal role!

关键词： End-user programming Computer-aided Education programming by examples programming by Natural Language Program Synthesis Domain-specific Languages Search Algorithms

来源：评论

学校读者我要写书评

暂无评论

FlashMeta: A Framework for Inductive Program Synthesis

引用

ACM SIGPLAN NOTICES 2015年第10期50卷 107-126页

作者： Polozov, Oleksandr Gulwani, Sumit Univ Washington Seattle WA 98195 USA Microsoft Res Redmond WA USA

关键词： Algorithms Languages Inductive program synthesis programming by examples frameworks domain-specific languages deductive inference search-based synthesis

来源：评论

学校读者我要写书评

暂无评论

FlashExtract: A Framework for Data Extraction by examples 14

FlashExtract: A Framework for Data Extraction by Examples

引用

35th ACM SIGPLAN Conference on programming Language Design and Implementation (PLDI)

作者： Le, Vu Gulwani, Sumit Univ Calif Davis Davis CA 95616 USA Microsoft Res Redmond Redmond WA USA

ISBN: (纸本)9781450327848

Various document types that combine model and view (e. g., text files, webpages, spreadsheets) make it easy to organize (possibly hierarchical) data, but make it difficult to extract raw data for any further manipulation or querying. We present a general framework FlashExtract to extract relevant data from semi-structured documents using examples. It includes: (a) an interaction model that allows end-users to give examples to extract various fields and to relate them in a hierarchical organization using structure and sequence constructs. (b) an inductive synthesis algorithm to synthesize the intended program from few examples in any underlying domain-specific language for data extraction that has been built using our specified algebra of few core operators (map, filter, merge, and pair). We describe instantiation of our framework to three different domains: text files, webpages, and spreadsheets. On our benchmark comprising 75 documents, FlashExtract is able to extract intended data using an average of 2.36 examples in 0.84 seconds per field.

关键词： Languages Algorithms Human Factors Program Synthesis End-user programming programming by examples

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：