检索结果-内蒙古大学图书馆

Proceedings of the ACM on Software Engineering 2024年第FSE期1卷 767-788页

作者： Sabaat Haroon Chris Brown Muhammad Ali Gulzar Virginia Tech Blacksburg USA

SQL is the most commonly used front-end language for data-intensive scalable computing (DISC) applications due to its broad presence in new and legacy workflows and shallow learning curve. However, DISC-backed SQL introduces several layers of abstraction that significantly reduce the visibility and transparency of workflows, making it challenging for developers to find and fix errors in a query. When a query returns incorrect outputs, it takes a non-trivial effort to comprehend every stage of the query execution and find the root cause among the input data and complex SQL query. We aim to bring the benefits of step-through interactive debugging to DISC-powered SQL with DeSQL. Due to the declarative nature of SQL, there are no ordered atomic statements to place a breakpoint to monitor the flow of data. DeSQL’s automated query decomposition breaks a SQL query into its constituent sub queries, offering natural locations for setting breakpoints and monitoring intermediate data. However, due to advanced query optimization and translation in DISC systems, a user query rarely matches the physical execution, making it challenging to associate subqueries with their intermediate data. DeSQL performs fine-grained taint analysis to dynamically map the subqueries to their intermediate data, while also recognizing subqueries removed by the optimizers. For such subqueries, DeSQL efficiently regenerates the intermediate data from a nearby subquery’s data. On the popular TPC-DC benchmark, DeSQL provides a complete debugging view in 13% less time than the original job time while incurring an average overhead of 10% in addition to retaining Apache Spark’s scalability. In a user study comprising 15 participants engaged in two debugging tasks, we find that participants utilizing DeSQL identify the root cause behind a wrong query output in 74% less time than the de-facto, manual debugging.

关键词： Debugging SQL data-intensive scalable computing

来源：评论

学校读者我要写书评

暂无评论

Software Engineering for data intensive scalable computing and Heterogeneous computing

Software Engineering for Data Intensive Scalable Computing a...

引用

IEEE/ACM International Conference on Software Engineering - Future of Software Engineering (ICSE-FoSE)

作者： Kim, Miryung UCLA Los Angeles CA 90095 USA

ISBN: (纸本)9798350324969

With the development of big data, machine learning, and AI, existing software engineering techniques must be re-imagined to provide the productivity gains that developers desire. Furthermore, specialized hardware accelerators like GPUs or FPGAs have become a prominent part of the current computing landscape. However, developing heterogeneous applications is limited to a small subset of programmers with specialized hardware knowledge. To improve productivity and performance for data-intensive and compute-intensive development, now is the time that the software engineering community should design new waves of refactoring, testing, and debugging tools for big data analytics and heterogeneous application development. In this paper, we overview software development challenges in this new data-intensive scalable computing and heterogeneous computing domain. We describe examples of automated software engineering (debugging, testing, and refactoring) techniques that target this data and compute intensive domain and share lessons learned from building these techniques.

关键词： data-intensive scalable computing heterogeneous computing big data analytics debugging testing refactoring software development tools

来源：评论

学校读者我要写书评

暂无评论

Debugging Big data Analytics in Spark with BigDebug 17

Debugging Big Data Analytics in Spark with <i>BigDebug</i>

引用

ACM International Conference on Management of data

作者： Gulzar, Muhammad Ali Interlandi, Matteo Condie, Tyson Kim, Miryung Univ Calif Los Angeles Los Angeles CA 90095 USA

ISBN: (纸本)9781450341974

To process massive quantities of data, developers leverage data-intensive scalable computing (DISC) systems such as Apache Spark. In terms of debugging, DISC systems support only postmortem log analysis and do not provide any debugging functionality. This demonstration paper showcases BIGDEBUG: a tool enhancing Apache Spark with a set of interactive debugging features that can help users in debug their Big data Applications.

关键词： debugging big data analytics disc data-intensive scalable computing automatic fault localization interactive tools

来源：评论

学校读者我要写书评

暂无评论

CSinParallel: using map-reduce to teach parallel programming concepts across the CS curriculum (abstract only) 13

CSinParallel: using map-reduce to teach parallel programming...

引用

Proceeding of the 44th ACM technical symposium on Computer science education

作者： Richard A. Brown Elizabeth Shoop Joel Adams St. Olaf College Northfield Minnesota USA Macalester College St. Paul Minnesota USA Calvin College Grand Rapids Michigan USA

ISBN: (纸本)9781450318686

Map-reduce, the cornerstone computational framework for cloud computing applications, has star appeal to draw students to the study of parallelism. Participants will carry out hands-on exercises designed for students at CS1/intermediate/advanced levels that introduce data-intensive scalable computing concepts, using WebMapReduce (WMR), a simplified open-source interface to the widely used Hadoop map-reduce programming environment. These hands-on exercises enable students to perform data-intensive scalable computations carried out on the most widely deployed map-reduce framework, used by Facebook, Microsoft, Yahoo, and other companies. WMR supports programming in a choice of languages (including Java, Python, C++, C#, Scheme); participants will be able to try exercises with languages of their choice. Workshop includes brief introduction to direct Hadoop programming, and information about access to cluster resources supporting WMR. Workshop materials will reside on ***, along with WMR software. Intended audience: CS instructors. Laptop required (Windows, Mac, or Linux).

关键词： education hadoop webmapreduce cs1 introductory course wmr curriculum csinparallel map-reduce computing distributed computing data-intensive scalable computing

来源：评论

学校读者我要写书评

暂无评论

WebMapReduce: An Accessible and Adaptable Tool for teaching Map-Reduce computing 11

WebMapReduce: An Accessible and Adaptable Tool for teaching ...

引用

42nd ACM Technical Symposium on Computer Science Education

作者： Garrity, Patrick Yates, Tim Brown, Richard Shoop, Elizabeth St Olaf Coll Northfield MN 55057 USA

ISBN: (纸本)9781450305006

WebMapReduce (WMR) is a strategically simplified user interface for the Hadoop implementation of the map-reduce model for distributed computing on clusters, designed so that novice programmers in an introductory CS courses can perform authentic data-intensive scalable computations using the programming language they are learning in their course. WMR currently supports Java, C++, Python, and Scheme computations, and can readily be extended to support additional programming languages, and configured to adapt to the practices at a particular institution for teaching introductory programming. The open-source system is designed to give beginning CS students experience with parallel computing and exposure to concepts of parallelism, at a wide variety of institutions with diverse curricular choices and cluster resources. Potential applications in courses at all undergraduate levels are indicated, and implementation of the WMR software is described.

关键词： Map-reduce CS1 introductory course parallel computing distributed computing data-intensive scalable computing CS curriculum education

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：