检索结果-内蒙古大学图书馆

On the automatic parallelization of the Perfect Benchmarks®

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1998年第1期9卷 5-23页

作者： Eigenmann, R Hoeflinger, J Padua, D Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA Univ Illinois Dept Comp Sci Urbana IL 61801 USA

This paper presents the results of the Cedar Hand-parallelization Experiment, conducted from 1989 through 1992, within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment, we manually transformed the Perfect Benchmarks(R) into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique. The results presented here extend the findings previously reported in [11]. The techniques credited most for the performance gains include array privatization, parallelization of reduction operations, and the substitution of generalized induction variables. All these techniques can be considered extensions of transformations that were available in vectorizers and commercial restructuring compilers of the late 1980s. We applied these transformations by hand to the given programs, in a mechanical manner, similar to that of a parallelizing compiler. Because of our success with these transformations, we believed that it would be possible to implement many of these techniques in a new parallelizing compiler. Such a compiler has been completed in the meantime and we show preliminary results.

关键词： program parallelization parallelization techniques restructuring compilers performance evaluation

来源：评论

学校读者我要写书评

暂无评论

A compiler optimization algorithm for shared-memory multiprocessors

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1998年第8期9卷 769-787页

作者： McKinley, KS Univ Massachusetts Dept Comp Sci Amherst MA 01003 USA

This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines.

关键词： program parallelization parallelization Techniques program Optimization Data Locality Restructuring Compilers Performance Evaluation

来源：评论

学校读者我要写书评

暂无评论

Nonlinear and symbolic data dependence testing

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1998年第12期9卷 1180-1194页

作者： Blume, W Eigenmann, R Hewlett Packard Co Cupertino CA 95014 USA Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

One of the most crucial qualities of an optimizing compiler is its ability to detect when different data references access the same storage location. Such references are said to be data-dependent and they impose constraints on the amount of program modifications the compiler can apply for improving the program's performance. For parallelizing compilers, the most important program constructs to investigate are loops and the array references they contain. In previous work. we have found a serious limitation of current data dependence tests to be that they cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions. In this paper, we describe a dependence test, called the Range Test. that can handle such expressions. Briefly, the Range Test proves independence by determining whether certain symbolic inequalities hold for a permutation of the loop nest. Powerful symbolic analyses and constraint propagation techniques were developed to prove such inequalities. The Range Test has been implemented in Polaris, a parallelizing compiler developed at the University of Illinois. We will present measurements of the Range Test's performance and compare it with state-of-the-art tests.

关键词： data-dependence test optimizing compilers program parallelization

来源：评论

学校读者我要写书评

暂无评论

AN IMPROVED SYSTOLIC ALGORITHM FOR THE ALGEBRAIC PATH PROBLEM

引用

INTEGRATION-THE VLSI JOURNAL 1993年第3期14卷 279-296页

作者： RAJOPADHYE, S Electrical and Computer Engineering Department Oregon State University Corvallis OR 97431-3211 USA

We systematically derive a systolic algorithm for the algebraic path problem (APP). We start with an algorithm with (parallel) running time 3n, but whose data dependencies are not local. The first step in obtaining a systolic algorithm is to localize these dependencies. The standard localization (used by all authors till now), forces slow-down the algorithm to the point that it takes 5n - 4 time steps. We show that much of this can be avoided, and give a new localization scheme which reduces the time to 4n - 2 (or 4n - 3 if n is odd). We also introduce a new approach to the problem of scheduling such parallel algorithms: we treat the equations defining the algorithm as an executable specification, but execute it on a non standard value domain. Such an abstract interpretation of the program then computes the optimal schedule of the original algorithm. A closed form can be inferred by observing a trace of this execution. We use this approach to obtain optimal schedules for all the algorithms in the paper-local as well as non-local.

关键词： LINEAR RECURRENCE programS STATIC ANALYSIS INDEXED COMPUTATIONS program parallelization MASSIVE PARALLELISM LOCALIZATION SCHEDULING

来源：评论

学校读者我要写书评

暂无评论

parallelization FOR MULTIPROCESSORS WITH MEMORY HIERARCHIES 1st

PARALLELIZATION FOR MULTIPROCESSORS WITH MEMORY HIERARCHIES

引用

1ST INTERNATIONAL CONF ON PARALLEL COMPUTATION

作者： GERNDT, M MORITSCH, H UNIV VIENNA DEPT STAT & COMP SCIA-1010 VIENNAAUSTRIA

ISBN: (纸本)9783540554370

programming shared memory multiprocessors seems to be easier than developing programs for distributed memory systems. The reason is the existence of a global names space for parallel threads providing uniform access to all global data. This programming model seems to be inadequate for systems with a larger number of processors since memory hierarchies are integrated to eliminate the bottleneck of global memory access. Therefore programming these systems has to be done with respect to the distribution of data, thus to enforce the parallel processes exploiting locality of references. This paper describes an ongoing project in which we investigate the applicability of the distributed memory parallelization strategy for shared memory systems with memory hierarchies.

关键词： SHARED MEMORY MULTIPROCESSORS DISTRIBUTED MEMORY MULTIPROCESSORS DOMAIN DECOMPOSITION program parallelization program TRANSFORMATIONS

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：