We present in this paper the strong points and limitations of semi-automatic parallelization, data parallel programming and message passing programming. We apply these on two numerical algorithms namely a bi-dimension...
详细信息
We present in this paper the strong points and limitations of semi-automatic parallelization, data parallel programming and message passing programming. We apply these on two numerical algorithms namely a bi-dimensional Fourier transform algorithm and a conjugate gradient programs. We implemented this program for each of the different methods on a Gray T3D. The results of these experiments demonstrate the accuracy of our proposition that when the three methods are combined, efficiency, portability and easiness of parallelprogramming may be achieved.
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture...
详细信息
A major reason for the lack of practical use of parallel computers has been the absence of a suitable model of parallel computation. Many existing models are either theoretical or are tied to a particular architecture. A more general model must be architecture independent, must realistically reflect execution costs, and must reduce the cognitive overhead of managing massive parallelism. A growing number of models meeting some of these goals have been suggested. We discuss their properties and relative strengths and weaknesses. We conclude that dataparallelism is a style with much to commend it, and discuss the Bird-Meertens formalism as a coherent approach to data parallel programming.
We developed a theory in order to address crucial questions of program design methodology. This theory deals with data locality which is a main issue in parallelprogramming. In this article, we regard this theory and...
详细信息
We developed a theory in order to address crucial questions of program design methodology. This theory deals with data locality which is a main issue in parallelprogramming. In this article, we regard this theory and its model as a minimum semantic domain for dataparallel languages. The introduction of a semantic domain is justified because the classical dataparallel languages (HPF and C*) have different intuitive semantics: Indeed, they use different concepts in order to express data locality. These concepts are alignment in HPF and shape in C*. Consequently these two languages define their own balance between compiler and programmer investments in order to reach program efficiency. We present our theory as a foundation for defining a better balance. (C) 2003 Elsevier B.V. All rights reserved.
This article presents the benchmarking by BERTIN (F) of the SUPERNODE SN1000 parallel architecture from PARSYS within the framework of the BECAUSE Project. This evaluation of a Distributed Memory parallel architecture...
详细信息
This article presents the benchmarking by BERTIN (F) of the SUPERNODE SN1000 parallel architecture from PARSYS within the framework of the BECAUSE Project. This evaluation of a Distributed Memory parallel architecture was realised by means of the BECAUSE Benchmark Set (BBS). The very strong idea was to specify parallelisation methodologies and to develop parallel software which are machine independent and as such portable. This approach was possible and realistic since the principle of parallelism which is involved is the data parallel programming. As a consequence, the hardware features of the target architecture are transparent to the industrial user and are managed through a communication library called 3P PARLIB. In this paper, principles of parallelisation which were used are presented. Practical implementation of these parallelisation principles is illustrated with various significant Test Programs from the BBS. The corresponding results are presented. Specifications for the 3P PARLIB (Portable parallelprogramming library) are also given.
Fortran 90 provides a rich set of array intrinsic functions that are useful for representing array expressions and data parallel programming. However, the application of these intrinsic functions to sparse data sets i...
详细信息
Fortran 90 provides a rich set of array intrinsic functions that are useful for representing array expressions and data parallel programming. However, the application of these intrinsic functions to sparse data sets in distributed memory environments, is currently not supported by vendors of Fortran 90 and HPF compilers. Our recent research work has been aimed at, providing parallel processing supports for sparse array intrinsics of Fortran 90. Our supporting library uses the following two-level design: (1) in our low-level routines, a sparse input matrix needs to be specified with compression/distribution schemes by programmers, and (2) in the high-level representation, sparse array functions are overloaded for array intrinsic interfaces so that programmers need not be concerned about low-level details. This raises a very interesting optimization problem in the strategies used to transform high-level representations to low-level routines by the automatic selection of distribution and compression schemes for sparse data sets. In this paper, we propose solutions to address this optimization problem, which is shown to be NP-hard. We develop a heuristic algorithm based on annotated program graphs. To the best of our knowledge, our selection scheme, is the first to automatically select compression and distribution schemes for sparse data arrays with the array intrinsics of Fortran 90 in a distributed memory environment. Experimental results show that our selection algorithms are consistent with our cost model, and effective in selecting appropriate compression and distribution schemes for improving the performance of application programs that operate on sparse data sets. Our experiments were performed on an IBM SP-2 machine using our parallel sparse array intrinsics for Fortran 90. (C) 2004 Published by Elsevier B.V.
We describe dataparallel list operations that exploit pair structure on lists and an algebra that relates them. Equations from the algebra are used as transformation rules, so that development is done in a calculatio...
详细信息
We describe dataparallel list operations that exploit pair structure on lists and an algebra that relates them. Equations from the algebra are used as transformation rules, so that development is done in a calculational way. We illustrate their use in applications such as FFTs and sorting, and show that optimal or near-optimal algorithms can result from a systematic calculational process. The operations have a natural, direct implementation on hypercubes.
We investigate the problem of evaluating Fortran 90-style array expressions on massively parallel distributed-memory machines. On such a machine, an elementwise operation can be performed in constant time for arrays w...
详细信息
We investigate the problem of evaluating Fortran 90-style array expressions on massively parallel distributed-memory machines. On such a machine, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost:of aligning them is part of the cost of evaluating the expression tree. The choice of where to perform the operation then affects this cost. We describe the communication cost of the parallel machine theoretically as a metric space;we model the alignment problem as that of finding a minimum-cost embedding of the expression tree into this space. We present algorithms based on dynamic programming that solve the embedding problem optimally for several communication cost metrics: multidimensional grids and rings, hypercubes, fat-trees, and the discrete metric. We also extend out approach to handle operations that change the shape of the arrays.
We give an in-depth introduction to the design of our functional array programming language SAC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The langua...
详细信息
We give an in-depth introduction to the design of our functional array programming language SAC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SAC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SAC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.
Although the field of datacenter computing is arguably still in its relative infancy, a sizable body of work from both academia and industry is already available and some consistent technological trends have begun to ...
详细信息
Although the field of datacenter computing is arguably still in its relative infancy, a sizable body of work from both academia and industry is already available and some consistent technological trends have begun to emerge. This special issue presents a small sample of the work underway by researchers and professionals in this new field. The selection of articles presented reflects the key role that hardware-software codesign plays in the development of effective datacenter-scale computer systems.
High Performance Fortran (HPF) offers an attractive high-level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and deleg...
详细信息
High Performance Fortran (HPF) offers an attractive high-level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and delegating to the compiler the task of generating an explicitly parallel program. Available HPF compilers can handle regular codes quite efficiently, but dramatic performance losses may be encountered for applications which are based on highly irregular, dynamically changing data structures and access patterns. In this paper we introduce the Vienna Fortran Compiler (VFC), a new source-to-source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications. In addition to extended data distribution and work distribution mechanisms, HPF+ provides the user with language features for specifying certain information that decisively influence a program's performance. This comprises data locality assertions, non-local access specifications and the possibility of reusing runtime-generated communication schedules of irregular loops. Performance measurements of kernels from advanced applications demonstrate that with a high-level dataparallel language such as HPF+ a performance close to hand-written message-passing programs can be achieved even for highly irregular codes.
暂无评论