Consider the selection problem of determining the kth smallest element of a set of n elements. Under the CGM (coarse-grained multicomputer) model with p processors and O(n/p) local memory, we present a deterministic p...
详细信息
Consider the selection problem of determining the kth smallest element of a set of n elements. Under the CGM (coarse-grained multicomputer) model with p processors and O(n/p) local memory, we present a deterministic parallel algorithm for the selection problem that requires O(log p) communication rounds. Besides requiring a low number of communication rounds, the algorithm also attempts to minimize the total amount of data transmitted in each round (only O(p) except in the last round). In addition to showing theoretical complexities, we present very promising experimental results obtained on a parallel machine that show almost linear speedup, indicating the efficiency and scalability of the proposed algorithm.
Clustering techniques are usually used in pattern recognition, image segmentation and object detection. For N patterns and k centers each with M features, in this paper, we first design an O(kM) time optimal parallel ...
详细信息
Clustering techniques are usually used in pattern recognition, image segmentation and object detection. For N patterns and k centers each with M features, in this paper, we first design an O(kM) time optimal parallel algorithm for one pass process of clustering with the k-means method on a linear array of processors with a wider bus network using N1+1/epsilon processors with one bus network, where c is any constant and c greater than or equal to 1. Then, based on the proposed algorithm, two O(k) and O(1) time optimal parallel clustering algorithms are also derived using MN1+1/epsilon and kMN(1+1/epsilon) processors with M row and MN row bus networks, respectively. These results improve the best known bounds and achieve cost optimal in their time and processor complexities. (C) 1999 Elsevier Science B.V. All rights reserved.
Generalized series-parallel (GSP) graphs belong to the class of decomposable graphs which can be represented by their decomposition trees. Given a decomposition tree of a GSP graph, there are many graph-theoretic prob...
详细信息
Generalized series-parallel (GSP) graphs belong to the class of decomposable graphs which can be represented by their decomposition trees. Given a decomposition tree of a GSP graph, there are many graph-theoretic problems which can be solved efficiently. An efficient parallel algorithm for constructing a decomposition tree of a given GSP graph is presented. It takes O(log n) time with C(m, n) processors on a CRCW PRAM, where C(m, n) is the number of processors required to find connected components of a graph with m edges and n vertices in logarithmic time. Based on our algorithmic results, we also derive some properties for GSP graphs, which may be of interest in and of themselves.
The map overlay-operation is one of the most important and most time consuming operations in the Geographic Information System. In general, a map consists of a huge number of line segments. The major cost for overlayi...
详细信息
The map overlay-operation is one of the most important and most time consuming operations in the Geographic Information System. In general, a map consists of a huge number of line segments. The major cost for overlaying two maps is that of computing the intersection points between the line segments. parallel processing is one of the ways to reduce the computing time. If one can partition the line segments into independent subsets, then these subsets can be processed in different processors simultaneously;thus, the computing time can be reduced. In this paper, we consider the problem of partitioning line segments into independent sets such that the load is balanced among the processors. An easy yet effective strategy is proposed to balance the load for a multi-processor computer which does not have many processors. The proposed algorithm can achieve good load balance when the average length of the line segments is short compared to the width of a map.
A parallel variant of the block Gauss-Seidel method is presented to solve the Poisson equation with Dirichlet boundary condition. This method uses two-dimensional logically connected parallel processors. Furthermore, ...
详细信息
A parallel variant of the block Gauss-Seidel method is presented to solve the Poisson equation with Dirichlet boundary condition. This method uses two-dimensional logically connected parallel processors. Furthermore, natural rowwise (NR) data flow block ordering is used so that its convergence rate is the same as that of the standard block Gauss-Seidel method. Spectral radius is determined by the formula for general k x I block iterative methods. Numerical computations on a parallel computer are included. (C) 1999 Published by Elsevier Science Inc. All rights reserved.
We present a simple and general parallel sorting scheme, ZZ-sort, which can be used to derive a class of efficient in-place sorting algorithms on realistic parallel machine models. We prove a tight bound for the worst...
详细信息
We present a simple and general parallel sorting scheme, ZZ-sort, which can be used to derive a class of efficient in-place sorting algorithms on realistic parallel machine models. We prove a tight bound for the worst case performance of ZZ-sort. We also demonstrate the average performance of ZZ-sort by experimental results obtained on a MasPar parallel computer. Our experiments indicate that ZZ-sort can be incorporated into a distributed memory parallel computer system as a standard routine, and this routine is useful for space critical situations. Finally, we show that ZZ-sort can be used to convert a non-adaptive parallel sorting algorithm into an in-place and adaptive one by considering the problem of sorting an arbitrarily large input on fixed-size reconfigurable meshes.
This paper deals with the parallel implementation on distributed memory architectures of the implicit residual smoothing procedure in the context of a explicit method for two dimensional inviscid flows. The governing ...
详细信息
This paper deals with the parallel implementation on distributed memory architectures of the implicit residual smoothing procedure in the context of a explicit method for two dimensional inviscid flows. The governing equations are discretized by a cell centered finite volume method and the time integration is performed by a explicit Runge Kutta method. Artificial dissipation and implicit residual smoothing are used in order to stabilize and speed up the method. The parallelism is introduced by grid partitioning. The parallel implementation of the residual smoothing, a inherently implicit procedure, is crucial for the efficiency of the method. Here, two different parallel residual smoothing strategies are discussed and some experimental results are given to illustrate parallel performances of the proposed strategies.
The distance transform (DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a...
详细信息
The distance transform (DT) is an image computation tool which can be used to extract the information about the shape and the position of the foreground pixels relative to each other. It converts a binary image into a grey-level image, where each pixel has a value corresponding to the distance to the nearest foreground pixel. The time complexity for computing the distance transform is fully dependent on the different distance metrics. Especially, the more exact the distance transform is, the worse execution time reached will be, Nowadays, quite often thousands of images are processed in a limited time. It seems quite impossible for a sequential computer to do such a computation for the distance transform in real time. In order to provide efficient distance transform computation, it is considerably desirable to develop a parallel algorithm for this operation. In this paper, based on the diagonal propagation approach, we first provide an O(N-2) time sequential algorithm to compute the chessboard distance transform (CDT) of an N x N image, which is a DT using the chessboard distance metrics. Based on the proposed sequential algorithm, the CDT of a 2D binary image array of size N x N can be computed in O(log N) time on the EREW PRAM model using O(N-2/log N) processors, O(log log N) time on the CRCW PRAM model using O(N-2/log log N) processors, and O(log N) time on the hypercube computer using O(N-2/log N) processors. Following the mapping as proposed by Lee and Horng, the algorithm for the medial axis transform is also efficiently derived. The medial axis transform of a 2D binary image array of size N x N can be computed in O(log N) time on the EREW PRAM model using O(N-2/log N) processors, O(log log N) time on the CRCW PRAM model using O(N-2/log log N) processors, and O(log N) time on the hypercube computer using O(N-2/log N) processors. The proposed parallel algorithms are composed of a set of prefix operations. In each prefix operation phase, only increase (add-one)
We recently developed a new randomized optimization framework, the Nested Partitions (NP) method. This approach uses partitioning, global random sampling, and local search heuristics to create a Markov chain that has ...
详细信息
We recently developed a new randomized optimization framework, the Nested Partitions (NP) method. This approach uses partitioning, global random sampling, and local search heuristics to create a Markov chain that has global optima as its absorbing states. This new method combines global and local search in a natural way and it is highly matched to emerging massively parallel processing capabilities. In this paper, we apply the NP method to the Travelling Salesman Problem. Preliminary numerical results show that the NP method generates high-quality solutions compared to well-known heuristic methods, and that it can be a very promising alternative for finding a solution to the TSP. (C) 1999 Elsevier Science Ltd. All rights reserved.
We performed a simulation of a cluster of five water molecules at 300 K using an ab initio potential. In our first study, the interactions were calculated at the Hartree-Fock level using a 6-31G* basis set. A parallel...
详细信息
We performed a simulation of a cluster of five water molecules at 300 K using an ab initio potential. In our first study, the interactions were calculated at the Hartree-Fock level using a 6-31G* basis set. A parallel big move (hybrid) Monte Carlo algorithm was used to conduct this modeling. We compared the results obtained for this "quantum" system with those obtained from a conventional simulation employing an effective pair potential. We also estimated properties of the quantum system in terms,of the configurations generated by the conventional simulation by employing the appropriate Boltzmann weighting. These estimates are in good agreement with those obtained from the full quantum simulation. We then repeated the Boltzmann weighting method, but this time using the BLYP density functional in our ab initio calculations, so as to include correlation effects. (C) 1999 John Wiley & Sons, Inc.
暂无评论