检索结果-内蒙古大学图书馆

optimization of data flow execution in a parallel environment

DISTRIBUTED AND PARALLEL dataBASES 2019年第3期37卷 385-410页

作者： Kougka, Georgia Gounaris, Anastasios Aristotle Univ Thessaloniki Dept Informat Thessaloniki Greece

Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled in current cost models. The contribution of this work is twofold. Firstly, we propose an advanced cost model that aims to reflect the response time of a data flow that is executed in parallel more accurately. Secondly, we show that existing optimization solutions are inadequate and develop new optimization techniques targeting the proposed cost model. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner;the latter is appropriately quantified and its significance is exemplified. Furthermore, we propose extensions to current optimizers that decide on the exact ordering of flow tasks taking into account the new optimization metric. Finally, we evaluate the new optimization algorithms and show up to 59% response time improvement over state-of-the-art task ordering techniques.

关键词： data flow optimization Cost modeling Task ordering

来源：评论

学校读者我要写书评

暂无评论

SOFA: An extensible logical optimizer for UDF-heavy data flows

引用

INFORMATION SYSTEMS 2015年 52卷 96-125页

作者： Rheinlaender, Astrid Heise, Arvid Hueske, Fabian Leser, Ulf Naumann, Felix Humboldt Univ Dept Comp Sci D-10099 Berlin Germany Hassoplattner Inst Software Syst Engn Potsdam Germany Tech Univ Berlin Berlin Germany

Recent years have seen an increased interest in large-scale analytical data flows on non-relational data. These data flows are compiled into execution graphs scheduled on large compute clusters. In many novel application areas the predominant building blocks of such data flows are user-defined predicates or functions (UDFS). However, the heavy use of UDFS is not well taken into account for data flow optimization in current systems. SOFA is a novel and extensible optimizer for UDF-heavy data flows. It builds on a concise set of properties for describing the semantics of Map/Reduce-style UDFS and a small set of rewrite rules, which use these properties to find a much larger number of semantically equivalent plan rewrites than possible with traditional techniques. A salient feature of our approach is extensibility: we arrange user-defined operators and their properties into a subsumption hierarchy, which considerably eases integration and optimization of new operators. We evaluate SOFA on a selection of hop-heavy data flows from different domains and compare its performance to three other algorithms for data flow optimization. Our experiments reveal that SOFA finds efficient plans, outperforming the best plans found by its competitors by a factor of up to six. (C) 2015 Elsevier Ltd. All rights reserved.

关键词： data flow optimization User-defined operators Map/reduce

来源：评论

学校读者我要写书评

暂无评论

Programming with Implicit flows

引用

IEEE SOFTWARE 2014年第5期31卷 52-59页

作者： Salvaneschi, Guido Mezini, Mira Eugster, Patrick Tech Univ Darmstadt Darmstadt Germany Univ Lancaster Lancaster LA1 4YW England Purdue Univ W Lafayette IN 47907 USA

Modern software differs significantly from traditional computer applications that mostly process reasonably small amounts of static input data-sets in batch mode. Modern software increasingly processes massive amounts of data, whereby it is also often the case that new input data is produced and/or existing data is modified on the fly. Consequently, programming models that facilitate the development of such software are emerging. What characterizes them is that data, respectively changes thereof, implicitly flow through computation modules. The software engineer declaratively defines computations as compositions of other computations without explicitly modeling how data should flow along dependency relations between data producer and data consumer modules, letting the runtime to automatically manage and optimize data flows.

关键词： Software Engineering Implicit flows Programming Static Input data Sets Programming Models Software Development Computation Modules data Producer Module data Consumer Module data flow Management data flow optimization Programming Big data Computational Modeling Runtime Software Engineering data Models Market Research Reactive Programming Event Stream Big data data flow Programming Languages Software Engineering

来源：评论

学校读者我要写书评

暂无评论

Dynamic Configuration of Partitioning in Spark Applications

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2017年第7期28卷 1891-1904页

作者： Gounaris, Anastasios Kougka, Georgia Tous, Ruben Montes, Carlos Tripiana Torres, Jordi Aristotle Univ Thessaloniki Dept Informat Thessaloniki 54124 Greece Aristotle Univ Thessaloniki Thessaloniki Greece Univ Politecn Cataluna ES-08034 Barcelona Spain Barcelona Supercomp Ctr Barcelona 08034 Spain

Spark has become one of the main options for large-scale analytics running on top of shared-nothing clusters. This work aims to make a deep dive into the parallelism configuration and shed light on the behavior of parallel spark jobs. It is motivated by the fact that running a Spark application on all the available processors does not necessarily imply lower running time, while may entail waste of resources. We first propose analytical models for expressing the running time as a function of the number of machines employed. We then take another step, namely to present novel algorithms for configuring dynamic partitioning with a view to minimizing resource consumption without sacrificing running time beyond a user-defined limit. The problem we target is NP-hard. To tackle it, we propose a greedy approach after introducing the notions of dependency graphs and of the benefit from modifying the degree of partitioning at a stage;complementarily, we investigate a randomized approach. Our polynomial solutions are capable of judiciously use the resources that are potentially at user's disposal and strike interesting trade-offs between running time and resource consumption. Their efficiency is thoroughly investigated through experiments based on real execution data.

关键词： data repartitioning data flow optimization data flow profiling spark

来源：评论

学校读者我要写书评

暂无评论

Date flow optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 2012年第2期E95D卷 374-382页

作者： Liu, Xinning Mei, Chen Cao, Peng Zhu, Min Shi, Longxing Southeast Univ Natl ASIC Syst Engn Res Ctr Nanjing Jiangsu Peoples R China Tsinghua Univ Inst Microelect Beijing 100084 Peoples R China

This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), a dynamically coarse grain reconfigurable architecture. REMUS-II consists of a mu PU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used to speeds up control-intensive tasks and data-intensive tasks respectively. The parallel computing capability and flexibility of REMUS-II makes itself an excellent candidate to process multimedia applications, which require a large amount of memory accesses. In this paper, we specifically optimize the data flow to deal with those performance-hazard and energy-hungry memory accessing in order to meet the bandwidth requirement of parallel computing. The RPU internal memory could work in multiple modes, like 2D-access mode and transformation mode, according to different multimedia access patterns. This novel design can improve the performance up to 26% compared to traditional on-chip memory. Meanwhile, the block buffer is implemented to optimize the off-chip data flow through reducing off-chip memory accesses, which reducing up to 43% compared to direct DDR access. Based on RTL simulation, REMUS-II can achieve 1080p@30 fps of H.264 High Profile@ Level 4 and High Level MPEG2 at 200 MHz clock frequency. The REMUS-II is implemented into 23.7 mm(2) silicon on TSMC 65 nm logic process with a 400 MHz maximum working frequency.

关键词： REMUS-II coarse grain reconfigurable architecture multimedia application data flow optimization H.264 HiP

来源：评论

学校读者我要写书评

暂无评论

data-Aware Service Choreographies Through Transparent data Exchange 16th

Data-Aware Service Choreographies Through Transparent Data E...

引用

16th International Conference on Web Engineering (ICWE)

作者： Hahn, Michael Karastoyanova, Dimka Leymann, Frank Univ Stuttgart IAAS Stuttgart Germany

ISBN: (纸本)9783319387918;9783319387901

Our focus in this paper is on enabling the decoupling of data flow, data exchange and management from the control flow in service compositions and choreographies through novel middleware abstractions and realization. This allows us to perform the data flow of choreographies in a peer-to-peer fashion decoupled from their control flow. Our work is motivated by the increasing importance and business value of data in the fields of business process management, scientific workflows and the Internet of Things, all of which profiting from the recent advances in data science and Big data. Our approach comprises an application life cycle that inherently introduces data exchange and management as a first-class citizen and defines the functions and artifacts necessary for enabling transparent data exchange. Moreover, we present an architecture of the supporting system that contains the Transparent data Exchange middleware, which enables the data exchange and management on behalf of service choreographies and provides methods for the optimization of the data exchange during their execution.

关键词： Service choreographies Transparent data exchange Decentralized data flow data flow optimization

来源：评论

学校读者我要写书评

暂无评论

A Management Life Cycle for data-Aware Service Choreographies 23

A Management Life Cycle for Data-Aware Service Choreographie...

引用

IEEE 23rd International Conference on Web Services (ICWS)

作者： Hahn, Michael Karastoyanova, Dimka Leymann, Frank Univ Stuttgart IASS Stuttgart Germany

ISBN: (纸本)9781509026753

This work is motivated by the increasing importance and business value of data in the fields of business process management, scientific workflows as a field in eScience, and Internet of Things, all of which profiting from the recent advances in data science and Big data. We introduce a management life cycle that renders data as first-class citizen in service choreographies and defines the functions and artifacts necessary for enabling transparent and efficient data exchange among choreography participants. The inherent goal of the life cycle, functions and artifacts is to help decouple the data flow, data exchange and management from the control flow in service compositions and choreographies. This decoupling enables peer-to-peer data exchange in choreographies and provides the means for more sophisticated data management and exchange, as well as data exchange and provisioning optimization.

关键词： Service Choreographies Choreography Management Life Cycle data flow optimization Transparent data Exchange

来源：评论

学校读者我要写书评

暂无评论

An artificial neural network for exploring the relationship between learning activities and students’ performance

引用

Decision Analytics Journal 2023年 9卷

作者： Borhani, Kourosh Wong, Richard T.K. Sunway University Malaysia

This paper identifies the most significant learning factors impacting undergraduate academic performance using artificial neural networks (ANNs) and controlled student data collection. As higher education becomes increasingly common and important, finding the best ways to help students optimize their studies is vital. Questionnaires gathered data on student behaviours and achievement from five classes within a semester, constraining variability to compare learning activities directly. The questionnaire captured engagement, psychological factors, effort, course load, time management, and performance data. Statistical and exploratory analysis investigated the dataset. A multilayer perceptron model was developed, using backpropagation and cross-validation to optimize predictive accuracy. The model identified class attendance, sleep quality, and questioning during lectures as most correlated with high grades. Additional patterns emerged around research participation, motivation, cramming, and theoretical studying. This research demonstrates new techniques for associating detailed study behaviours with academic achievement through strictly controlled student data collection and the application of artificial neural networks for predictive modelling. The constrained variability in the dataset allows for isolating the impact from specific learning activities. The controlled student data and machine learning-driven predictive modelling provide information on the optimal grouping of student effort across engagement, health, and studying factors. © 2023 The Author(s)

关键词： Artificial neural network data flow optimization Learning analytics Multilayer perceptron model Questionnaire design Student's performance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：