Consider an n-dimensional simd hypercube H-n with [3n/2] - 1 faulty nodes. With n + 3 log(n - 1) + 7, n + 2 log(n - 1) + 9, n + log(n - 1) + O(log log(n - 1)), n + log(n - 1) + 12, and n + 19 steps, this paper present...
详细信息
Consider an n-dimensional simd hypercube H-n with [3n/2] - 1 faulty nodes. With n + 3 log(n - 1) + 7, n + 2 log(n - 1) + 9, n + log(n - 1) + O(log log(n - 1)), n + log(n - 1) + 12, and n + 19 steps, this paper presents some one-to-all broadcasting algorithms on the faulty simd H-n. The sequence of dimensions used for broadcasting in each algorithm is the same regardless of which node is the source. The proposed one-to-all broadcasting algorithms can tolerate [n/2] more faulty nodes than Raghavendra. and Sridhar's algorithms (J. Parallel Distrb. Comput. 35 (1996) 57) although 8 extra steps are needed. The fault-tolerance improvement of this paper is about 50%. (c) 2004 Published by Elsevier Inc.
Consider an n-dimensional simd hypercube H-n with [3n/2] - 1 faulty nodes. Given 2(n) operands, this paper presents an efficient algorithm for prefix computation on the faulty H-n. Employing the newly proposed delay-u...
详细信息
Consider an n-dimensional simd hypercube H-n with [3n/2] - 1 faulty nodes. Given 2(n) operands, this paper presents an efficient algorithm for prefix computation on the faulty H-n. Employing the newly proposed delay-update technique and the subcube partition scheme, the proposed algorithm takes n+5logn+7 steps, and it tolerates [n/2] more Faulty nodes than does Raghavendra and Sridhar's algorithm [4] although 11 extra steps are needed.
In this paper, we develop algorithms in order of efficiency for all-to-all broadcast problem in an N= 2(n)-node rr dimensional faulty simd hypercube, Q(n), with up to n - 1 node faults. The algorithms use a property o...
详细信息
In this paper, we develop algorithms in order of efficiency for all-to-all broadcast problem in an N= 2(n)-node rr dimensional faulty simd hypercube, Q(n), with up to n - 1 node faults. The algorithms use a property of a certain ordering of dimensions. Our analysis includes startup time(alpha) and transfer time(beta). We have established the lower bound for such an algorithm to be n alpha+ (2N- 3)L beta in a faulty hypercube with at most n - 1 faults (each node has a value of L bytes). Our best algorithm requires 2n alpha + 2NL beta and is near-optimal. We develop an optimal algorithm for matrix multiplication in a faulty hypercube using all-to-all broadcast and compare the efficiency of all-to-all broadcast approach with broadcast approach and global sum approach for matrix multiplication. The algorithms are congestion-free and applicable in the context of available hypercube machines.
There is a wide class of parallel algorithms which require communications in the form of a 2(b) permutation and -2(b) permutation. (A 2(b) permutation of N elements, where N = 2(n) moves the element from position i to...
详细信息
There is a wide class of parallel algorithms which require communications in the form of a 2(b) permutation and -2(b) permutation. (A 2(b) permutation of N elements, where N = 2(n) moves the element from position i to position i + 2(b) mod N. The inverse is called a -2(b) permutation.) We first examine the complexity of performing a 2(b) permutation of N elements on an N -PE simd hypercube. (The hypercube model assumes that all data movements are restricted to a single dimension at a time.) We prove that, given any mapping of the elements to processors, log N-b steps are needed to perform a 2(b) permutation. This lower bound suggests that if a parallel algorithm requires f(N) communication steps of type +/-2(b), it may require Omega(f(N) log N) steps on a simd hypercube. We identify an important class of parallel computations, called +/-2(b) descend, which includes Batcher's odd-even merge and many other algorithms. A computation in this class performs log N iterations on an N -element input A[O : N - 1]. For b = log N - 1,...,0, iteration b computes the new value of each A[i] as a function of A[i] A[i + 2(b)] and A[i - 2(b)]. We obtain an efficient O(log N) algorithm for this class on a simd hypercube. We discuss several problems which are solved elegantly using this general algorithm. We also study a related class of parallel computations called +/-2(b)-ascend. A computation in this class has a structure similar to +/-2(b)-descend, except that the iterations run in the increasing order of b. We develop a simple O(log(2) N/log log N) algorithm for this class on a simd hypercube, assuming Omega(log log N) space per PE. It remains open whether this class can be implemented on a simd hypercube in O(log N) time.
An efficient distributed algorithm for evaluating an iterative function on all pairwise combinations of C objects on an simd hyercube is presented. The algorithm achieves uniform load distribution and minimal, complet...
详细信息
An efficient distributed algorithm for evaluating an iterative function on all pairwise combinations of C objects on an simd hyercube is presented. The algorithm achieves uniform load distribution and minimal, completely local interprocessor communication.
暂无评论