A common statistical problem is that of finding the median element in a set of data. This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed...
详细信息
A common statistical problem is that of finding the median element in a set of data. This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k. Our general framework is an SPMD distributed-memory programming model that is enhanced by a set of communication primitives. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The algorithms have been coded in the message-passing standard MPI, and our experimental results from the IBM SP-2 illustrate the scalability and efficiency of our algorithm and improve upon all the related experimental results known to the author. The main contributions of this paper are (1) New techniques for speeding the performance of certain randomized algorithms, such as selection, which are efficient with likely probability. (2) A new, practical randomized selection algorithm (UltraFast) with significantly improved convergence. (C) 2004 Elsevier Inc. All rights reserved.
We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computatio...
详细信息
We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some dense matrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms. (C) 2002 Elsevier Science B.V. All rights reserved.
暂无评论