This paper presents a PRAM algorithm for computing the n x n Euclidean distance map. This algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log n/log log n) time using n...
详细信息
This paper presents a PRAM algorithm for computing the n x n Euclidean distance map. This algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log n/log log n) time using n(2) log log n/log n processors on the common CRCW PRAM, respectively. This algorithm is also applicable to many distance maps, for example, cityblock, chessboard, octagonal and chamfer distance maps.
In this paper we present an O(1/alpha log n)-time parallel algorithm for computing the convex hull of n points in R(3). This algorithm uses O(n(1+alpha)) processors on a CREW PRAM, for any constant 0<alpha less tha...
详细信息
In this paper we present an O(1/alpha log n)-time parallel algorithm for computing the convex hull of n points in R(3). This algorithm uses O(n(1+alpha)) processors on a CREW PRAM, for any constant 0parallel algorithms proposed for this problem use time at least O(log(2) n). In addition, the algorithm presented here is the first parallel algorithm for the three-dimensional convex hull problem that is not based on the serial divide-and-conquer algorithm of Preparata and Hong, whose crucial operation is the merging of the convex hulls of two linearly separated point sets. The contributions of this paper are therefore (i) an O(log n)-time parallel algorithm for the three-dimensional convex hull problem, and (ii) a parallel algorithm for this problem that does not follow the traditional paradigm.
This paper describes a new parallel algorithm for solving the m-machines, n-jobs flow-shop scheduling problem as well as its implementation on a distributed memory multiprocessor. The algorithm is basically a parallel...
详细信息
This paper describes a new parallel algorithm for solving the m-machines, n-jobs flow-shop scheduling problem as well as its implementation on a distributed memory multiprocessor. The algorithm is basically a parallelization of the usual branch-and-bound method. It also takes advantage of the all-search method to keep the efficiency of parallel processing reasonably high when subproblems become smaller than a certain size. The performance evaluation is done by comparing the parallel execution of this algorithm on the nCUBE2 multiprocessor and the sequential execution of the branch-and-bound with depth-first search algorithm. The result shows that the mean speedup ratio for some conditions of the problem is more than the number of processors, and the mean speedup ratio for the conditions on which the sequential executions complete quickly is not smaller than 1.
In this paper, we present the implementation of a volume graphics rendering algorithm using shift-restoration operations on parallel algebraic logic (PAL) image processor. The algorithm is a parallel ray casting algor...
详细信息
ISBN:
(纸本)0819425885
In this paper, we present the implementation of a volume graphics rendering algorithm using shift-restoration operations on parallel algebraic logic (PAL) image processor. The algorithm is a parallel ray casting algorithm. In order to eliminate shading artifacts caused by inaccurate estimation of surface normal vectors, we use gray level volume instead of binary volume, and apply a low pass filter to smooth the volume object surfaces. By transforming the volume to an intermediate coordinate system to which there is a simple mapping from the object coordinate system, we solve the data redistribution problem caused by nonregular data access patterns in volume rendering. It has been proved very effective in reducing the data communication cost of the rendering algorithm on the PAL hardware.
Chromosome image segmentation is an important step toward automatic karyotyping that involves visualization and interpretation of chromosomes. In this paper, we analyze the characteristics of chromosome images that ca...
详细信息
ISBN:
(纸本)0819425885
Chromosome image segmentation is an important step toward automatic karyotyping that involves visualization and interpretation of chromosomes. In this paper, we analyze the characteristics of chromosome images that can be effectively used for segmenting chromosomes and can be efficiently extracted on the Lockheed-Martin PAL parallel image processor. We design and implement a parallel algorithm that uses local features to split touching chromosomes.
We study efficient parallel solutions to the problem of selecting r elements at specified ranks from a set of n arbitrary elements, known as multiselection, on a hypercube with p processors, p, r less than or equal to...
详细信息
ISBN:
(纸本)0818682596
We study efficient parallel solutions to the problem of selecting r elements at specified ranks from a set of n arbitrary elements, known as multiselection, on a hypercube with p processors, p, r less than or equal to n,. We propose two parallel algorithms based on different approaches. where one requires processors to operate in the SIMD mode, and the other in the MIMD mode, Our SIMD algorithm runs iu time O((log n log log n) min{r, log n}) when p = Theta(n), and O(n(epsilon) min{r, (1 - epsilon) log n}) when p = n(epsilon) for any 0 < epsilon < 1, where the latter is cost optimal when r greater than or equal to p. Our MIMD algorithm runs in O(log n log log n log r) time when p = Theta(n), and in O(n(epsilon) log r) time when p = n(epsilon) for any 0 < epsilon < 1, which is cost optimal for any r. Both algorithms are more efficient than the possible straightforward solutions and that of direct simulation of the optimal EREW algorithm.
We present a new parallel volume rendering algorithm based on the split-light model for rendering and the Bulk Synchronous parallel (BSP) model for parallelization. The BSP model provides a simple and architecture-ind...
详细信息
ISBN:
(纸本)0819425885
We present a new parallel volume rendering algorithm based on the split-light model for rendering and the Bulk Synchronous parallel (BSP) model for parallelization. The BSP model provides a simple and architecture-independent approach to structure the parallel program. This parallel program has been tested on a shared memory SGI PowerChallenge machine, a distributed memory IBM SP2 machine and a network of UNIX workstations.
We address the task of measuring the relative speed (speedup) of two systems A and B for solving the same problem. For example, B may be a parallel algorithm, parametrized by the number of processors used, whose runni...
详细信息
We address the task of measuring the relative speed (speedup) of two systems A and B for solving the same problem. For example, B may be a parallel algorithm, parametrized by the number of processors used, whose running time has to be related to a serial standard algorithm A. If A and/or B are randomized or if we are interested in their performance on a (discrete) probability distribution of problem instances, the running times are described by random variables T-A and T-B The speedup of B over A is usually defined as E(T-A)/E(T-B) where E denotes the expected value. In many cases this definition is not appropriate for the user of A or B, because the summation in E(T-A) and E(T-B) hides information about the speedup of individual runs. We propose an alternative speedup definition of the form M(T-A/T-B) and present a set of intuitive functional equations, which any such function M(T-A/T-B) should fulfill. Finally, we prove that the weighted geometric mean is the only solution of these equations.
We consider the problem of efficiently performing a reduce-scatter operation in a message passing system. Reduce-scatter is the composition of an element-wise reduction on vectors of n elements initially held by n pro...
详细信息
We consider the problem of efficiently performing a reduce-scatter operation in a message passing system. Reduce-scatter is the composition of an element-wise reduction on vectors of n elements initially held by n processors, with a scatter of the resulting vector among the processors. In this paper, we present two algorithms for the reduce-scatter operation, designed in LogGP. The first algorithm assumes an associative and commutative reduction operator and it is optimal in LogGP within a small constant factor. The second algorithm allows the reduction operator to be noncommutative, and it is asymptotically optimal when values to be combined are large arrays. To achieve these results, we developed a complete analysis of both algorithms in LogGP, including the derivation of lower bounds for the reduce-scatter operation, and the study of the m-item version of the problem, i.e., the case when the initial elements are vectors themselves. Reduce-scatter has been included as a collective operation in the MPI standard message passing library, and can be used, for instance, in parallel matrix-vector multiply when the matrix is decomposed by columns. To model a message passing system, we adopted the LogGP model, an extension of LogP that allows the modeling of messages of different length. While this choice makes the analysis somewhat more complex, it leads to more realistic results in the case of gather/scatter algorithms.
This paper describes an O(log(3) n) time O(n/log n) processors parallel algorithm for determining the congruence (exact matching) of two point sets in three-dimensions on a CREW PRAM, where n is the maximum size of th...
详细信息
This paper describes an O(log(3) n) time O(n/log n) processors parallel algorithm for determining the congruence (exact matching) of two point sets in three-dimensions on a CREW PRAM, where n is the maximum size of the input point sets. Although optimal O(n log n) time sequential algorithms were developed for this problem, no efficient parallel algorithm was known previously. In the algorithm, the original problem is reduced to the two-dimensional congruence problem by computing a three-dimensional point set cps(S) for each input point set S, where cps(S) satisfies the following conditions: 0 < \cps(S)\ less than or equal to 12;cps(T(S)) = T(cps(S)) for all isometric transformations T. The two-dimensional problem can be solved efficiently in parallel using a parallel version of a previously-known sequential algorithm. cps(S) is computed recursively in the following way: the size of a point set is reduced by a constant factor in each recursive step. To reduce the size of a point set, a convex hull is constructed and then it is regarded as a planar graph, so that combinatorial properties of a planar graph are used effectively. A sequential version of the algorithm works in O(n log n) time, so that this paper gives another optimal sequential algorithm. The presented algorithm can be applied for graphs such that each vertex corresponds to a point and each edge corresponds to a line segment connecting its endpoints. Moreover, the algorithm can be modified for computing the canonical form of a point set or a graph.
暂无评论