We report on our experiences with the implementation of the imperative nested parallel language V. We give an overview of the compiler and a description of its building blocks and their interplay. We show how function...
We present a proof outline generation system for a simple data-parallel kernel language called . Proof outlines for are very similar to those for usual scalar-like languages. In particular, they can be mechanically ge...
详细信息
We present a proof outline generation system for a simple data-parallel kernel language called . Proof outlines for are very similar to those for usual scalar-like languages. In particular, they can be mechanically generated backwards from the final post-assertion of the program. They appear thus as a valuable basis to implement a validation assistance tool for data-parallelprogramming. The equivalence between proof outlines and the sound and complete Hoare logic defined for in previous papers is discussed in section 5.
The work/step framework provides a high-level cost model for nested data-parallelprogramming languages, allowing programmers to understand the efficiency of their codes without concern for the eventual mapping of tas...
详细信息
The work/step framework provides a high-level cost model for nested data-parallelprogramming languages, allowing programmers to understand the efficiency of their codes without concern for the eventual mapping of tasks to processors. Vectorization, or flattening, is the key technique for compiling nested-parallel languages. This paper presents a formal study of vectorization, considering three low-level targets: the EREW, bounded-contention CREW, and CREW variants of the VRAM. For each, we describe a variant of the cost model and prove the correctness of vectorization for that model. The models impose different constraints on the set of programs and implementations that can be considered;we discuss these in detail.
The SB-PRAM is a lock-step-synchronous, massivelyparallel multiprocessor currently being built at Saarbruecken University, with up to 4096 RISC-style processing elements and with a (from the programmer's view) ph...
详细信息
The SB-PRAM is a lock-step-synchronous, massivelyparallel multiprocessor currently being built at Saarbruecken University, with up to 4096 RISC-style processing elements and with a (from the programmer's view) physically shared memory of up to 2 GByte with uniform memory access time. Fork95 is a redesign of the PRAM language FORK, based on ANSI C, with additional constructs to create parallel processes, hierarchically dividing processor groups into subgroups, managing shared and private address subspaces. Fork95 makes the assembly-level synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it provides comfortable facilities for locally asynchronous computation where desired by the programmer. We show that Fork95 offers full expressibility for the implementation of practically relevant parallel algorithms. We do this by examining all known parallelprogramming paradigms used for the parallel solution of real-world problems, such as strictly synchronous execution, asynchronous processes, pipelining and systolic algorithms, parallel divide and conquer, parallel prefix computation, data parallelism, etc., and show how these parallelprogramming paradigms are supported by the Fork95 language and run time system.
In this paper we study the issues of programmability and performance in the parallelization of weather prediction models. We compare parallelization using a high level library (the Nearest Neighbor Tool: NNT) and a hi...
详细信息
In this paper we study the issues of programmability and performance in the parallelization of weather prediction models. We compare parallelization using a high level library (the Nearest Neighbor Tool: NNT) and a high level language/directive approach (High Performance Fortran: HPF). We report on the performance of a complete weather prediction model (the Rapid Update Cycle, which is currently run operationally at the National Meteorological Center at Washington) coded using NNT. We quantify the performance effects of optimizations possible with NNT that must be performed by an HPF compiler.
There are substantial problems with exploiting parallelism, particularly massive parallelism. One attempt to solve these problems is general-purpose parallelism, which searches for models that are abstract enough to b...
详细信息
There are substantial problems with exploiting parallelism, particularly massive parallelism. One attempt to solve these problems is general-purpose parallelism, which searches for models that are abstract enough to be useful for software development, but that map well enough to realistic architectures that they deliver performance. We show how the skeletons model is a suitable general-purpose model for massive parallelism, and show its power by illustrating a new algorithm for search in structured text. The algorithm is sufficiently complex that it would have been hard to find without the theory underlying the Bird-Meertens formalism. The example also demonstrates the opportunities for parallelism in new, non-scientific and non-numeric applications.
In this paper, two programming tools are presented, facilitating the development of portable parallel applications on distributed memory systems. The Orchid system is a software platform, i.e. a set of facilities for ...
详细信息
In this paper, two programming tools are presented, facilitating the development of portable parallel applications on distributed memory systems. The Orchid system is a software platform, i.e. a set of facilities for parallelprogramming. It consists of mechanisms for transparent message passing and a set of primitive functions to support the distributed shared memory programming model. In order to free the user from the tedius task of parallelprogramming, a new environment for logic programming is introduced: the Daffodil framework. Daffodil, implemented on top of Orchid, evaluates pure PROLOG programs, exploiting the inherent AND/OR parallelism. Both systems have been implemented and evaluated on various platforms, since the layered structure of Orchid ensures portability only by re-engineering a small part of the code.
PSETL is a prototyping language for developing efficient numeric code for massivelyparallel machines. PSETL enables parallel algorithms to be concisely specified at a very high level, and successively refined into lo...
详细信息
PSETL is a prototyping language for developing efficient numeric code for massivelyparallel machines. PSETL enables parallel algorithms to be concisely specified at a very high level, and successively refined into lower level architecture-specific code. It includes a rich variety of parallel loops over sets, bags, and tuples, and a hierarchy of communication mechanisms, ranging from atomic assignments to reductions and scans on collections. We illustrate the parallel features of PSETL and the refinement process using an N-body simulation code as a case study. The high-level code, which is only a few pages long, is refined for execution on shared and disjoint address-space MIMD machines.
作者:
Cosnard, M.LIP-CNRS
Ecole Normale Supérieure de Lyon 46 Alée d'Italie Lyon69364 France
We compare various models of parallel machines and show that they can be classified in two classes: algorithm oriented or execution oriented. None of them are really satisfying from the user's point of view. Hence...
详细信息
Numerical algorithms often exhibit potential parallelism caused by a coarse structure of submethods in addition to the medium grain parallelism of systems within submethods. We present a derivation methodology for par...
详细信息
Numerical algorithms often exhibit potential parallelism caused by a coarse structure of submethods in addition to the medium grain parallelism of systems within submethods. We present a derivation methodology for parallel programs of numerical methods on distributed memory machines that exploits both levels of parallelism in a group-SPMD parallel computation model. The derivation process starts with a specification of the numerical method in a module structure of submethods, and results in a parallel frame program containing all implementation decisions of the parallel implementation. The implementation derivation includes scheduling of modules, assigning processors to modules and choosing data distributions for basic modules. The methodology eases parallelprogramming and supplies a formal basis for automatic support. An analysis model allows performance predictions for parallel frame programs. In this article we concentrate on the determination of optimal data distributions using a dynamic programming approach based on data distribution types and incomplete run-time formulas.
暂无评论