The design of complex parallel algorithms relies heavily on a set of primitive operations. In this work, we examine the problem of merging two sorted sequences in an architecture independent setting. We derive paralle...
详细信息
The design of complex parallel algorithms relies heavily on a set of primitive operations. In this work, we examine the problem of merging two sorted sequences in an architecture independent setting. We derive parallel algorithms that can be used on a variety of parallel machines and whose performance can be reliably predicted if these algorithms are analyzed under the bulk-synchronous parallel (bsp) model. While our algorithms are fairly simple themselves, description of their performance in terms of the bsp parameters is somewhat complicated. The main reward for quantifying these complications, is that it enables parallel software to be written once and for all that can be migrated efficiently among a variety of parallel platforms, The optimal merging algorithm presented in this work achieves asymptotically optimal parallel efficiency compared to any optimal sequential merging algorithm. (C) 2001 Elsevier Science B.V. All rights reserved.
The model of bulk-synchronous parallel (bsp) computation is an emerging paradigm of general-purpose parallel computing. Originally, bsp was defined as a distributed memory model. Shared-memory style bsp programming ha...
详细信息
The model of bulk-synchronous parallel (bsp) computation is an emerging paradigm of general-purpose parallel computing. Originally, bsp was defined as a distributed memory model. Shared-memory style bsp programming had to be provided by PRAM simulation. However, this approach destroys data locality and therefore may prove inefficient for many practical problems. In this paper we present a new bsp-type model, called bspRAM, which reconciles shared-memory style programming with efficient exploitation of data locality. bspRAM can be optimally simulated by bsp for a broad range of algorithms. We identify some characteristic properties of such algorithms: obliviousness, slackness, granularity. Finally, we illustrate these concepts by presenting bspRAM algorithms for butterfly dag computation, cube dag computation, dense matrix multiplication and sorting. (C) 1998-Elsevier Science B.V. All rights reserved.
The main result of this paper shows that the block-based digital medial axis transform can be computed in parallel by a constant number of calls to scan (parallel prefix) operations. This gives time- and/or work-optim...
详细信息
The main result of this paper shows that the block-based digital medial axis transform can be computed in parallel by a constant number of calls to scan (parallel prefix) operations. This gives time- and/or work-optimal parallel implementations for the distance-based and the block-based medial axis transform in a wide variety of parallel architectures. Since only eight scan operations plus a dozen local operations are performed, the algorithm is very easy to program and use. The originality of our approach is the use of the notion of a derived grid and the oversampling of the image in order to reduce the computation of the block-based medial axis transform in the original grid to the much easier task of computing the distance based medial axis transform of the oversampling of the image on the derived grid.
In this paper. we show new parallel algorithms for a set of classical string comparison problems. computation of string alignments. longest common subsequences (LCS) or edit distances, and longest increasing subsequen...
详细信息
ISBN:
(纸本)9781450300797
In this paper. we show new parallel algorithms for a set of classical string comparison problems. computation of string alignments. longest common subsequences (LCS) or edit distances, and longest increasing subsequence computation. These problems have a wide range of applications, in particular in computational biology and signal processing. We discuss the scalability of our new parallel algorithms in computation time, in memory, and in commumcation Our new algorithms are based on an efficient parallel method for (min, +)-multiplication of distance matrices The core result of this paper is a scalable parallel algorithm for multiplying Implicit simple unit-Monge matrices of size n x n on p processors using lime O(n log n/p), communication O(n log p/p) and O( log p) supersteps. This algorithm allows us to implement scalable LCS computation for two strings of length n using time O(n(2)/p) and communication O(n/root P) requiring local memory of size O(n/root P) on each processor Furthermore, our algorithm can be used to obtain the first generally work-scalable algorithm for computing the longest increasing subsequence (LIS) Our algorithm for LIS computation requires computation O(n log(2) n/p), communication O(n log(2) p/p), and O(log(2) p) supersteps for computing the LIS of a sequence of length n This is within a log n factor of work-optimality for the LIS problem. which can be solved sequentially in time O(n log n) in the comparison-based model. Our LIS algorithm is also within a log p-factor of achieving perfectly scalable communication and furthermore has perfectly scalable memory size requirements of O(n/p) per processor
暂无评论