检索结果-内蒙古大学图书馆

Automatic data and computation decomposition on distributed memory parallel computers

ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS 2002年第1期24卷 1-50页

作者： Lee, P Kedem, ZM Acad Sinica Inst Informat Sci Taipei 11529 Taiwan Acad Sinica Ctr Appl Sci & Engn Res Taipei 11529 Taiwan NYU Courant Inst Math Sci Dept Comp Sci New York NY 10003 USA

To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled-in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the "importance" of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the "owner computes" rule or the "owner stores" rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular, we use tiling to allow pipelining and the overlapping of computation and communication. However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.

关键词： algorithms languages computation decomposition data alignment data distribution distributed-memory computers dominant data array iteration space mapping vector parallelizing compilers spatial dependence vector temporal dependence vector tiling techniques

来源：评论

学校读者我要写书评

暂无评论

A Method of computation decomposition on Tightly-Nested Loop Automatic Parallelization

A Method of Computation Decomposition on Tightly-Nested Loop...

引用

3rd International Symposium on Intelligent Information Technology Application

作者： Yan, Zhao Liu, Lei Jilin Univ Coll Comp Sci & Technol Changchun 130023 Peoples R China

ISBN: (纸本)9780769538594

An automatic parallelization method for tightly-nested loops running on multi-core system has been proposed. First, according to the physical characteristics of multi-core processors, a way has been presented to solve the problem on dada locality during data decomposition;Second, for increasing parallel granularity of tight nested loops, the method discussed in this article studied computation decomposition based on workload, and brought forward how to compute the workload of loop iteration that can be run in parallel, and at last according to the size of the workload, determined the granularity of parallel loops to achieve to reduce the parallel overhead brought by the parallel iteration of small workload. Using this method,an automatic parallelization model based on workload can be constructed.

关键词： computation decomposition workload data decomposition data locality loop parallelization

来源：评论

学校读者我要写书评

暂无评论

An automatic computation and data decomposition algorithm of prioritized dominant array

An automatic computation and data decomposition algorithm of...

引用

13th International Conference on Parallel and Distributed Computing, Applications, and Technologies (PDCAT)

作者： Ding, Rui Zhao, Rongcai Han, Lin Natl Digital Switching Syst Engn & Technol Res Ct Zhengzhou Peoples R China

ISBN: (纸本)9780769548791

Automatic decomposition is a compile technique that maps computation and data onto different processors, and array is one of the main targets processed. Certain arrays, whose migration is the most expensive, are termed as dominant arrays. Since every computing node has its own memory on distributed memory parallel computers (DMPCs), decompositions of dominant arrays have directly impact on the performance of parallel program. To avoid remote data accessing, each definition and use of arrays needs to be distributed consistently, so as that there are too many partition constraints to increase decomposition choices of dominant arrays. We propose an automatic computation and data decomposition algorithm of prioritized dominant array in this paper. Our algorithm ranks arrays according to their potential communication costs, and then finds data decomposition for arrays in the decreasing order of rank. We serialize low rank arrays to enhance the decomposition priority of high rank ones. Finally, serial arrays are to be partitioned if maintaining parallel benefits of previous results. The experimental results show that this algorithm can improve the performance of parallel programs.

关键词： automatic parallelization data decomposition computation decomposition dominant arrays

来源：评论

学校读者我要写书评

暂无评论

An Improving computation and Data decomposition Method

An Improving Computation and Data Decomposition Method

引用

2006年国际电子、工程及科学领域的分布式计算和应用学术研讨会

作者： Chunli Dong,RongcaiZhao,Jianmin Pang,Lin Han,Peng Du National Digital Switching System Engineering & Technology R &D Center Zhengzhou 450002,P.R CHINA

Data locality is critical to achieving high performance on high-performance parallel *** how to find a good data decomposition is becoming a key issue in parallelizing *** have developed a compiler system that fully automatically parallelizes sequential programs and optimizes data decomposition to improve data *** data decomposition algorithm consists of two *** first step chooses the basic data and computation decomposition without considering read-only *** second step then changes the data decomposition considering read-only data. We ran our compiler on a set of application *** results show that the algorithm can effectively discovers parallelism.

关键词： data decomposition computation decomposition read-only data loop level parallelism

来源：评论

学校读者我要写书评

暂无评论

SAE: Toward Efficient Cloud Data Analysis Service for Large-Scale Social Networks

引用

IEEE TRANSACTIONS ON CLOUD COMPUTING 2017年第3期5卷 563-575页

作者： Zhang, Yu Liao, Xiaofei Jin, Hai Tan, Guang Huazhong Univ Sci & Technol Sch Comp Sci & Technol Cluster & Grid Comp Labr Serv Comp Technol & Syst Lab Wuhan 430074 Hubei Peoples R China Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518055 Peoples R China

Social network analysis is used to extract features of human communities and proves to be very instrumental in a variety of scientific domains. The dataset of a social network is often so large that a cloud data analysis service, in which the computation is performed on a parallel platform in the could, becomes a good choice for researchers not experienced in parallel programming. In the cloud, a primary challenge to efficient data analysis is the computation and communication skew (i. e., load imbalance) among computers caused by humanity's group behavior (e. g., bandwagon effect). Traditional load balancing techniques either require significant effort to re-balance loads on the nodes, or cannot well cope with stragglers. In this paper, we propose a general straggler-aware execution approach, SAE, to support the analysis service in the cloud. It offers a novel computational decomposition method that factors straggling feature extraction processes into more fine-grained sub-processes, which are then distributed over clusters of computers for parallel execution. Experimental results show that SAE can speed up the analysis by up to 1.77 times compared with state-of-the-art solutions.

关键词： Cloud service social network analysis computational skew communication skew computation decomposition

来源：评论

学校读者我要写书评

暂无评论

Communication optimization on automatic program parallelization

Communication optimization on automatic program parallelizat...

引用

8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing/3rd ACIS International Workshop on Self-Assembling Wireless Networks

作者： Dong Chun-Li Zhao Rong-Cai Ma Zhuo-Jie Li Dong-Hong Natl Digital Switching Syst Engn & Technol Res Ct Zhengzhou 450002 Peoples R China

ISBN: (纸本)9780769529097

Minimizing communication by increasing the locality of data references is an important optimization for achieving high performance on distributed memory machines. But in the progress of decomposition, reorganization is inevitable. And the communication produced by reorganization is inevitable too. In this paper, the authors present a linear decomposition algorithm that automatically finding computation and data decomposition, including finding data and computations decomposition that has data reorganization communication. And the authors improve the method and reduce the communication cost by merging parallel regions with the same data decomposition.

关键词： parallelizing compiler MPI data reorganization parallel region merging data decomposition computation decomposition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：