检索结果-内蒙古大学图书馆

23rd Brazilian Symposium on Formal Methods (SBMF)

作者： de Souza Neto, Joao Batista Moreira, Anamaria Martins Vargas-Solar, Genoveva Musicante, Martin A. Univ Fed Rio Grande do Norte Dept Informat & Appl Math DIMAp Natal RN Brazil Univ Fed Rio de Janeiro Comp Sci Dept DCC Rio De Janeiro Brazil Univ Grenoble Alpes LIG LAFMIA Grenoble INP CNRS Grenoble France

ISBN: (纸本)9783030638818;9783030638825

We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely, operations over data (filtering, aggregation, join) and program execution, defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data flow. This approach allows the data processing program specification to be agnostic of the target Big data processing system. As a first application of the model, we used it to formalize mutation operators for the application of mutation testing in Big data processing programs. The testing tool TRANSMUT-Spark implement these operators.

关键词： Big data processing data flow programming models Petri Nets Monoid Algebra

来源：评论

学校读者我要写书评

暂无评论

A two-level formal model for Big data processing programs

引用

SCIENCE OF COMPUTER programming 2022年 215卷 102764-102764页

作者： de Souza Neto, Joao Batista Moreira, Anamaria Martins Vargas-Solar, Genoveva Musicante, Martin A. Univ Fed Rio Grande do Norte Dept Informat & Appl Math DIMAp Natal RN Brazil Fed Ctr Technol Educ Minas Gerais Dept Informat Management & Design DIGD DV Divinopolis Brazil Univ Fed Rio de Janeiro Inst Comp IC Rio De Janeiro Brazil LIRIS French Council Sci Res CNRS Lyon France

This paper proposes a model for specifying data flow-based parallel data processing programs agnostic of target Big data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs, generalizing the strategies adopted by data flow Big data processing frameworks. The proposed model relies on Monoid Algebra and Petri Nets to abstract Big data processing programs in two levels: a higher level representing the program data flow and a lower level representing data transformation operations (e.g., filtering, aggregation, join). We extend the model for data processing programs proposed in [1], for modeling iterative data processing programs. The general specification of these programs implemented by data flow-based parallel programming models is essential given the democratization of iterative and greedy Big data analytics algorithms. Indeed, these algorithms call for revisiting parallel programming models to express iterations. The paper gives a comparative analysis of the iteration strategies proposed by Apache Spark, DryadLINQ, Apache Beam, and Apache Flink. It discusses how the model achieves to generalize these strategies. (c) 2021 Elsevier B.V. All rights reserved.

关键词： Big data processing data flow programming models Petri nets Monoid algebra

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：