We consider the static processor allocation problem for arbitrarily nested parallel loops on distributed memory, message-passing hypercubes. We present HYPAL (HYpercube Partitioning ALgorithm) as an efficient algorith...
详细信息
We consider the static processor allocation problem for arbitrarily nested parallel loops on distributed memory, message-passing hypercubes. We present HYPAL (HYpercube Partitioning ALgorithm) as an efficient algorithm to solve this problem. HYPAL calculates an optimal set of partitions of the dimension of the hypercube, and assigns them to the set of iterations of the nested loop. Some considerations about the influence of the communication overhead in order to get a more realistic approach are considered. The main problem at this point is to obtain the communication pattern associated to the parallel program because it depends on scheduling and data distribution.
D-Tuili,having been implemented on microcomputer network,is a distributed logical rea- soning programming language.D-Tuili supports parallel programming on the language level,and couples loosely with the distributed d...
详细信息
D-Tuili,having been implemented on microcomputer network,is a distributed logical rea- soning programming language.D-Tuili supports parallel programming on the language level,and couples loosely with the distributed database management system,so data in distributed databases can be used in the distributed logic *** this paper,we mainly introduce the components of D-Tuili used to design distributed logic ***,the main principles to imple- ment D-Tuili and the main technologies adopted in the implemented system of D-Tuili are described.
The parallelization of loops can be made formal by basing it on an algebraic theory of loop transformations. In this theory, the concept of unimodularity arises. We discuss the pros and cons of insisting on unimodular...
详细信息
The parallelization of loops can be made formal by basing it on an algebraic theory of loop transformations. In this theory, the concept of unimodularity arises. We discuss the pros and cons of insisting on unimodularity.
This paper presents a method of generating parallel target code with explicit communication for massively parallel, distributed-memory machines. The source programs are shared-memory parallel programs with explicit co...
详细信息
This paper presents a method of generating parallel target code with explicit communication for massively parallel, distributed-memory machines. The source programs are shared-memory parallel programs with explicit control structures. The method contains four main parts: 1) analysis: extracting syntactic reference patterns from a program with shared address space, 2) pattern matching: selecting appropriate communication routines, 3) scheduling: placing these routines in appropriate locations in the target program text, and 4) synchronization: setting up correct conditions for invoking these routines. We use an explicit communication metric to guide the selection of data layout strategies.
A scheme for the compilation of imperative or functional programs into systolic programs is demonstrated on matrix composition/decomposition and Gauss-Jordan elimination. Using this scheme, programs for the processor ...
详细信息
A scheme for the compilation of imperative or functional programs into systolic programs is demonstrated on matrix composition/decomposition and Gauss-Jordan elimination. Using this scheme, programs for the processor network Warp and for several transputer networks have been generated.
The parallelizing compiler for the B-HIVE loosely-coupled multiprocessor system uses a medium grain model to minimize the communication overhead. A medium grain model is shown to be an optimum way of merging fine grai...
详细信息
The parallelizing compiler for the B-HIVE loosely-coupled multiprocessor system uses a medium grain model to minimize the communication overhead. A medium grain model is shown to be an optimum way of merging fine grain operations into parallel tasks such that the parallelism obtained at the grain level is retained and communication overhead is decreased. A new communication model is introduced in this paper, allowing additional overlap between computation and communication. Simulation results indicate that the medium grain communication model shows promise for automatic parallelization for a loosely-coupled multiprocessor system.
暂无评论