A parallel algorithm for generating all combinations ofm out ofn items in lexicographic order is presented. The algorithm usesm processors and runs inO(nCm) time. The cost of the algorithm, which is the parallel runni...
详细信息
A parallel algorithm for generating all combinations ofm out ofn items in lexicographic order is presented. The algorithm usesm processors and runs inO(nCm) time. The cost of the algorithm, which is the parallel running time multiplied by the number of processors used, is optimal to within a constant multiplicative factor in view of the Ω(ncm*m) lower bound on the number of operations required to solve this problem using a sequential computer.
The subject of this note is the parallel algorithm for depth first searching of a directed acyclic graph by Ghosh and Bhattacharjee. It is pointed out that their algorithm does not always work. A counter example is gi...
详细信息
The subject of this note is the parallel algorithm for depth first searching of a directed acyclic graph by Ghosh and Bhattacharjee. It is pointed out that their algorithm does not always work. A counter example is given. This paper also states the necessary and sufficient condition for the algorithm to fail, or to work correctly.
We present a parallel algorithm to solve the visibility problem among n vertical segments in a plane, which can be implemented on a VLSI chip arranged as a mesh of trees. Our algorithm determines all the pairs of segm...
详细信息
We present a parallel algorithm to solve the visibility problem among n vertical segments in a plane, which can be implemented on a VLSI chip arranged as a mesh of trees. Our algorithm determines all the pairs of segments that "see" each other in time O(log n); while the fastest sequential algorithm requires O(n log n). A lower bound to the area-time complexity of this problem of O(n2 log2 n) is also derived.
A standard preconditioned conjugate gradient algorithm for the solution of symmetric linear systems is studied in the context of multiprocessing. A totally parallel approach is taken based on a computational model of ...
详细信息
A standard preconditioned conjugate gradient algorithm for the solution of symmetric linear systems is studied in the context of multiprocessing. A totally parallel approach is taken based on a computational model of an MIMD machine with shared global memory. In order to assure mathematical correctness of the algorithm, four barrier syncs are required during each iteration. Large linear systems are solved with this parallel adaptation of conjugated gradient and efficiencies near one observed on the CRAY X-MP24 running COS 1.13. Further, segments of the code which could be considered independent (i.e. between syncs) clocked in at speedups very close to the number of tasks. The latter indicates that the loss efficiency in this implementation of the algorithm on the X-MP24 is connected with the cost of barrier syncs. On the CRAY X-MP48 running COS 1.15 similar results are observed utilizing two CPU's. When all four CPU's execute parallel tasks in vector mode memory bank conflicts cause a 30% loss of overall speedup.
We present a parallel method to solve the generalized eigenvalue problem on a linear array of processors, each connected to their nearest neighbors and operating synchronously. We also include a wrap-around connection...
详细信息
We present a parallel method to solve the generalized eigenvalue problem on a linear array of processors, each connected to their nearest neighbors and operating synchronously. We also include a wrap-around connection from end to end. Our method is based on the well-known QZ algorithm of Moler and Stewart, which simultaneously reduces two n × n matrices to upper triangular form by orthogonal or unitary transformations. We show how this algorithm may be partitioned and distributed of n + 1 processors, achieving a speed-up over the serial algorithm of O( n ). We use the concept of windows to describe the action of each processor at each step. We show how to incorporate singles shifts, and how to apply orthogonal plane rotations on either side of a matrix without the need to transpose the matrix itself.
Given a sequence of n ordered but not sorted b-bit integers, an algorithm is described to select the k th smallest by reducing the length of the original sequence until only the required k th value or those equal to t...
详细信息
Given a sequence of n ordered but not sorted b-bit integers, an algorithm is described to select the k th smallest by reducing the length of the original sequence until only the required k th value or those equal to the k th remain. Using the time to add two bits as the unit, a running time of O(b log n) is obtained by employing O(n) simple processors arranged in a binary tree. The algorithm is then adapted to run on a binary tree of processors with N leaves, where N log N ⩽ n, in O(bn/N) time for an optimal cost of O(bn).
Local operators, used in many image processing tasks, involve replacing each pixel in an image with a value computed within a local neighborhood of that pixel. Computing such operators at the video rate requires a com...
详细信息
Local operators, used in many image processing tasks, involve replacing each pixel in an image with a value computed within a local neighborhood of that pixel. Computing such operators at the video rate requires a computing power which is not provided by conventional computers. Though computationally expensive, local operators are highly regular. Thus, a VLSI implementation appears particularly appropriate. This correspondence presents systolic algorithms for tasks such as connected component determination, distance transform, and relaxation, which are defined in terms of local operators.
Positive definite and positive semidefinite (also called nonnegative definite) real quadratic and Hermitian forms play important roles in many control and dynamics applications. A practical test for positive definiten...
详细信息
Positive definite and positive semidefinite (also called nonnegative definite) real quadratic and Hermitian forms play important roles in many control and dynamics applications. A practical test for positive definiteness that does not require explicit calculation of the eigenvalues is the principal minor test. The note formulates the correct necessary and sufficient condition for the positive semidefinite matrix.
A fast parallel thinning algorithm is proposed in this paper. It consists of two subiterations: one aimed at deleting the south-east boundary points and the north-west corner points while the other one is aimed at del...
详细信息
暂无评论