the advent of new parallel architectures has increased the need for parallel optimizing compilers to assist developers in creating efficient code. OpenUH is a state-of-the-art optimizing compiler, but it only performs...
详细信息
ISBN:
(纸本)9781605583976
the advent of new parallel architectures has increased the need for parallel optimizing compilers to assist developers in creating efficient code. OpenUH is a state-of-the-art optimizing compiler, but it only performs a limited set of optimizations for OpenMP programs due to its conservative assumptions of shared memory programming. these limitations may prevent some OpenMP applications from being fully optimized to the extent of its sequential counterpart. this paper describes our design and implementation of a parallel data flow framework, consisting of a parallel Control Flow Graph (PCFG) and a parallel SSA (PSSA) representation in OpenUH, to model data flow for OpenMP programs. this framework enables the OpenUH compiler to perform all classical scalar optimizations for OpenMP programs, in addition to conducting OpenMP specific optimizations.
We parallelize the 'go withthe winners' algorithm of Aldous and Vazirani (in: proceedings of the 35th IEEE symposium on the Foundations of Computer Science, IEEE Computer Society Press, Silver Spring., MD, 19...
详细信息
We parallelize the 'go withthe winners' algorithm of Aldous and Vazirani (in: proceedings of the 35th IEEE symposium on the Foundations of Computer Science, IEEE Computer Society Press, Silver Spring., MD, 1994, pp. 492-501) and analyze the resulting parallel algorithm in the LogP-model (in: proceedings of the Fourth ACM SIGPLAN symposium on principles & practice of parallelprogramming, 1993, pp. 1-12). the main issues in the analysis are load imbalances and communication delays. the result of the analysis is a practical algorithm which, under reasonable assumptions, achieves linear speedup. Finally, we analyze our algorithm for a concrete application: generating models of amorphous chemical structures. (C) 2003 Elsevier Inc. All rights reserved.
In this paper, we propose a straightforward solution to the problems of compositional parallelprogramming by using skeletons as the uniform mechanism for structured composition. In our approach parallel programs are ...
详细信息
In this paper, we propose a straightforward solution to the problems of compositional parallelprogramming by using skeletons as the uniform mechanism for structured composition. In our approach parallel programs are constructed by composing procedures in a conventional base language using a set of high-level, predefined, functional, parallel computational forms known as skeletons. the ability to compose skeletons provides us withthe essential tools for building further and more complex application-oriented skeletons specifying important aspects of parallel computation. Compared withthe process network based composition approach, such as PCN, the skeleton approach abstracts away the fine details of connecting communication ports to the higher level mechanism of making data distributions conform, thus avoiding the complexity of using lower level ports as the means of interaction. thus, the framework provides a natural integration of the compositional programming approach withthe data parallelprogramming paradigm.
Pizza is a strict superset of Java that incorporates three ideas from the academic community: parametric polymorphism, higher-order functions, and algebraic data types. Pizza is defined by translation into Java and co...
详细信息
Pizza is a strict superset of Java that incorporates three ideas from the academic community: parametric polymorphism, higher-order functions, and algebraic data types. Pizza is defined by translation into Java and compiles into the Java Virtual Machine, requirements which strongly constrain the design space. Nonetheless, Pizza fits smoothly to Java, with only a few rough edges.
作者:
PHILIPPSEN, MICSI
International Computer Science Institute Berkeley CA and Dept. of Informatics University of Karlsruhe
this paper investigates the problem of aligning array data and processes in a distributed-memory implementation. We present complete algorithms for compile-time analysis, the necessary program restructuring, and subse...
详细信息
ISBN:
(纸本)9780897917001
this paper investigates the problem of aligning array data and processes in a distributed-memory implementation. We present complete algorithms for compile-time analysis, the necessary program restructuring, and subsequent code-generation, and discuss their complexity. We finally evaluate the practical usefulness by quantitative experiments. the technique presented analyzes complete programs, including branches, loops, and nested parallelism. Alignment is determined with respect to offset, stride, and general ass's relations. Pplacement of both data and processes are computed in a unifying framework based on an extended preference graph and its analysis. Dynamic redistributions are derived. the experimental results are very encouraging. the optimization algorithms implemented in our Modula-2* compiler improved the execution times of the programs by an average over 40% on a MasPar MP-1 with 16384 processors.
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows pr...
详细信息
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization.
this paper studies the essence of heterogeneity from the perspective of language mechanism design. the proposed mechanism, called tiles, is a program construct that bridges two relative levels of computation: an outer...
详细信息
ISBN:
(纸本)9781450332057
this paper studies the essence of heterogeneity from the perspective of language mechanism design. the proposed mechanism, called tiles, is a program construct that bridges two relative levels of computation: an outer level of source data in larger, slower or more distributed memory and an inner level of data blocks in smaller, faster or more localized memory.
this paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn parallel Performance Tools running withthe Blizzard fine-grain distributed s...
详细信息
ISBN:
(纸本)9780897919067
this paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn parallel Performance Tools running withthe Blizzard fine-grain distributed shared memory system. this approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric manner. As a demonstration, Paradyn helped us improve the performance of a new shared-memory application program by a factor of four.
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell&...
详细信息
ISBN:
(纸本)9781450319225
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor. We also describe how to extend Haswell's HLE mechanism to achieve a similar effect to our lock elision scheme entirely in hardware.
暂无评论