This paper presents several algorithms for solving problems using massively parallel simd hypercube and shuffle-exchange computers. The algorithms solve a wide variety of problems, but they are related because they al...
详细信息
This paper presents several algorithms for solving problems using massively parallel simd hypercube and shuffle-exchange computers. The algorithms solve a wide variety of problems, but they are related because they all use a common strategy. Specifically, all of the algorithms use a divide-and-conquer approach to solve a problem with N inputs using a parallel computer with P processors. The structural properties of the problem are exploited to assure that fewer than N data items are communicated during the division and combination steps of the divide-and-conquer algorithm. This reduction in the amount of data that must be communicated is central to the efficiency of the algorithm. This paper addresses four problems, namely the multiple-prefix, data-dependent parallel-prefix, image-component-labeling, and closest-pair problems. The algorithms presented for the data-dependent parallel-prefix and closest-pair problems are the fastest known when N greater-than-or-equal-to P and the algorithms for the multiple-prefix and image-component-labeling problems are the fastest known when N is sufficiently large with respect to P.
We discuss implementation of additive Schwarz type algorithms on simd computers. A recursive, additive algorithm is compared with a two-level scheme. These methods are based on a subdivision of the domain into thousan...
详细信息
ISBN:
(纸本)0898712882
We discuss implementation of additive Schwarz type algorithms on simd computers. A recursive, additive algorithm is compared with a two-level scheme. These methods are based on a subdivision of the domain into thousands of micro-patches that can reflect local properties, coupled with a coarser, global discretization where the `macro' behavior is reflected. The two-level method shows very promising flexibility, convergence and performance properties when implemented on a massively parallel simd computer.
The potential speedup for simd parallel implementations of APL programs is considered. Both analytical and (simulated) empirical studies are presented. The approach is to recognize that nearly 95% of the operators app...
详细信息
The potential speedup for simd parallel implementations of APL programs is considered. Both analytical and (simulated) empirical studies are presented. The approach is to recognize that nearly 95% of the operators appearing in APL programs are either scalar primitive, reduction or indexing and so the performance of these operators gives a good estimate of the amount of speedup a full program might receive. Substantial speedups are demonstrated for these operators and the empirical evidence accords with the analytical estimates.
Performing permutations of data on simd computers efficiently is important for high-speed execution of parallel algorithms. In this correspondence we consider realizing permutations such as perfect shuffle, matrix tra...
详细信息
Performing permutations of data on simd computers efficiently is important for high-speed execution of parallel algorithms. In this correspondence we consider realizing permutations such as perfect shuffle, matrix transpose, bit-reversal, the class of bit-permute- complement (BPC), the class of Omega, and inverse Omega permutations on N = 2n processors with Illiac IV-type interconnection network, where each processor is connected to processors at distances of ± 1 and ± N. The minimum number of data transfer operations required for realizing any of these permutations on such a network is shown to be 2(N − 1). We provide a general three-phase strategy for realizing permutations and derive routing algorithms for performing perfect shuffle, Omega, Inverse Omega, bit reversal, and matrix-transpose permutations in 2(N − 1) steps. Our approach is quite simple, and unlike previous approaches, makes efficient use of the topology of the Illiac IV-type network to realize these permutations using the optimum number of data transfers. Our strategy is quite powerful: any permutation can be realized using this strategy in 3(N − 1) steps.
The problems of measuring the performance of a highly parallel multiple processor system, such as the 4096 element ICL Distributed Array Processor are presented in relation to the conventional methods used for serial ...
详细信息
The problems of measuring the performance of a highly parallel multiple processor system, such as the 4096 element ICL Distributed Array Processor are presented in relation to the conventional methods used for serial processors; this is preceded by a brief description of the DAP hardware in order to. provide a framework for the discussion, together with some of the resulting implications for algorithm design. The importance of choosing algorithms for parallel computation in such a way as to make the best use of the parallelism of the hardware for the problem to be solved is discussed, and examples are given of parallel and hybrid algorithms—in the latter a mixture of serial and parallel techniques are used. A method of comparison of performance at the problem solving level is presented, which is illustrated by results obtained by DAP users studying problems which arise in a wide range of application areas.
A large-network algorithm solves a problem of size N on a network of N processors. We present a method for transforming certain large networks into quotient networks that emulate those large networks with fewer proces...
详细信息
A large-network algorithm solves a problem of size N on a network of N processors. We present a method for transforming certain large networks into quotient networks that emulate those large networks with fewer processors. Large-network algorithms are easily modified to execute on the quotient network. The emulations result in no loss in execution efficiency. Quotient networks allow algorithms to be designed assuming any number of processors and executed efficiently at a great savings in hardware cost.
暂无评论