We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, whic...
详细信息
We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and “unicycular graphs.” Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.
Two efficient parallel algorithms for computing the forward dynamics for real-time simulation were developed for implementation on a single-instruction multiple-data-stream (SIMD) computer with n processors, where n i...
详细信息
Two efficient parallel algorithms for computing the forward dynamics for real-time simulation were developed for implementation on a single-instruction multiple-data-stream (SIMD) computer with n processors, where n is the number of degrees of freedom of the manipulator. The first parallel algorithm, based on the composite rigid-body method, generates the inertia matrix using the parallel Newton-Euler algorithm, the parallel linear recurrence algorithm, and the modified row-sweep algorithm, and then inverts the inertia matrix to obtain the joint acceleration vector at time t. The time complexity of this parallel algorithm is of the order O(n/sup 2/) with O(n) processors. The second parallel algorithm, based on the conjugate gradient method, computes the joint acceleration with a time complexity of O(n) for multiplication operation and O(n log/sub 2/n) for addition operation. The interprocessor communication problem for the implementation of the proposed parallel algorithms on SIMD machines is also discussed and analyzed.< >
In this paper interior-point methods for linear programming, developed in the context of sequential computation, are used to obtain a parallel algorithm for the bipartite matching problem. This algorithm finds a maxim...
详细信息
In this paper interior-point methods for linear programming, developed in the context of sequential computation, are used to obtain a parallel algorithm for the bipartite matching problem. This algorithm finds a maximum cardinality matching in a bipartite graph with n nodes and m edges in O(square-root m log3 n) times on a CRCW PRAM. The results here extend to the weighted bipartite matching problem and to the zero-one minimum-cost flow problem, yielding O(square-root m log2 n log nC) algorithms, where C > 1 is an upper bound on the absolute value of the integral weights or costs in the two problems, respectively. The results here improve previous bounds on these problems and introduce interior-point methods to the context of parallel algorithm design.
This paper presents an optimal sequential and an optimal parallel algorithm to compute a minimum cardinality Steiner set and a Steiner tree. The sequential algorithm takes O (n) time and parallel algorithm takes O (lo...
详细信息
This paper presents an optimal sequential and an optimal parallel algorithm to compute a minimum cardinality Steiner set and a Steiner tree. The sequential algorithm takes O (n) time and parallel algorithm takes O (log n) time and O (n /log n) processors on an EREW PRAM model.
This paper introduces the queue-read queue-write (QRQW) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of...
详细信息
This paper introduces the queue-read queue-write (QRQW) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The QRQW pram model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied CRCW pram or EREW pram models: the CRCW model does not adequately penalize algorithms with high contention to shared-memory locations, while the EREW model is too strict in its insistence on zero contention at each step. The QRQW PRAM is strictly more powerful than the EREW pram. This paper shows a separation of root log n between the two models, and presents faster and more efficient QRQW algorithms for several basic problems, such as linear compaction, leader election, and processor allocation. Furthermore, we present a work-preserving emulation of the QRQW pram with only logarithmic slowdown on Valiant's bsp model, and hence on hypercube-type noncombining networks, even when latency, synchronization, and memory granularity overheads are taken into account. This matches the best-known emulation result for the EREW pram, and considerably improves upon the best-known efficient emulation for the CRCW pram on such networks. Finally, the paper presents several lower bound results for this model, including lower bounds on the time required for broadcasting and for leader election.
A promising new massively parallel technique for rigorous simulation of topography scattering in optical lithography has been developed and tested. The method is equivalent to the time-domain finite-difference method ...
详细信息
A promising new massively parallel technique for rigorous simulation of topography scattering in optical lithography has been developed and tested. The method is equivalent to the time-domain finite-difference method (TDFDM) used in electromagnetic scattering simulations, but exploits the parallel nature of wave propagation and the power of recent massively parallel architectures such as the Connection Machine. A working code called TEMPEST has been implemented on a Connection Machine CM-2 having 1 to 32 K processors with up to 1 M virtual processors. Numerical accuracy comparable with that of other fully rigorous methods was achieved. The convergence in the time domain for lithographic problems was found to be dominated by the physical process of multiple scattering of incident radiation. A very significant finding was that the solution required constant time per iteration for problems ranging from a few thousand unknowns up to one million, providing the ratio between the problem size and the number of processors is kept constant. The suitability of TEMPEST for solving a large class of topography structures important in alignment, metrology, and lithography is illustrated by examples from linewidth measurement with thin-film interference effects and projection printing in the presence of specular reflections from curved substrates.
作者:
ALNUWEIRI, HMKUMAR, VKP1. EEB-244
Department of Electrical Engineering Systems University of Southern California 90089-2562 Los Angeles CA USA
We present processor-time optimal parallel algorithms for several problems on n x n digitized image arrays, on a mesh-connected array having p processors and a memory of size O(n2) words. The number of processors p ca...
详细信息
We present processor-time optimal parallel algorithms for several problems on n x n digitized image arrays, on a mesh-connected array having p processors and a memory of size O(n2) words. The number of processors p can vary over the range [1, n3/2] while providing optimal speedup for these problems. The class of image problems considered here includes labeling the connected components of an image;computing the convex hull, the diameter, and a smallest enclosing box of each component;and computing all closest neighbors. Such problems arise in medium-level vision and require global operations on image pixels. To achieve optimal performance, several efficient data-movement and reduction techniques are developed for the proposed organization.
This paper presents very fast parallel algorithms for approximate edge coloring. Let log((1)) n = log n, log((k)) n = log(log((k-1)) n), and log*(n) = min{k \ log((k)) n /log(c/4) log*(n)])2 colors in O(log log*(n)) t...
详细信息
This paper presents very fast parallel algorithms for approximate edge coloring. Let log((1)) n = log n, log((k)) n = log(log((k-1)) n), and log*(n) = min{k \ log((k)) n < 1}. It is shown that a graph with n vertices and in edges can he edge colored with (2[log(1/4)log*(n)])(c) ([/log(c/4) log*(n)])2 colors in O(log log*(n)) time using O(m + n) processors on the EREW PRAM, where Delta is the maximum vertex degree of the graph and c is an arbitrarily large constant. It is also shown that the graph can he edge colored using at most [4 Delta (1+4/log log log*(Delta)) log(1/2) log*(Delta )1 colors in O(log Delta log log*(Delta)/log log log* (Delta) + log log*(n)) time using O(m + n) processors on the same model. O 2001 Elsevier Science B.V. All rights reserved.
In this paper, a partitioned-parallel strategy for the solution of the Toeplitz equations appearing in the linear prediction case is introduced. From the one extreme, i.e., the use of the Schur recursions for an order...
详细信息
In this paper, a partitioned-parallel strategy for the solution of the Toeplitz equations appearing in the linear prediction case is introduced. From the one extreme, i.e., the use of the Schur recursions for an order recursive implementation with O(1) processors which perform O(p2) MADs (multiplications and divisions), to the other extreme, i.e., the use of the same recursions for a fully parallel implementation with O(p) processors which perform O(p) MADs, there exists a compromise: The hardware designer can 'cut' the computational scheme into suitable partitions, which are executed one after the other, with all the computations of each partition organized in a parallel manner. This way he can achieve increased flexibility, especially in relation to the model order, which can become totally independent of the available number of processors. Moreover, in this paper an abatement methodology is introduced which significantly reduces the number of multiplications of the above computational schemes, as well as the overall algorithm complexity in the case of the parallel design.
A property of algorithms called synergy is introduced, and a quantity s , of synergy is defined. When synergized, both parallel and serial algorithms run faster, the parallel algorithms benefiting from a cooperation b...
详细信息
A property of algorithms called synergy is introduced, and a quantity s , of synergy is defined. When synergized, both parallel and serial algorithms run faster, the parallel algorithms benefiting from a cooperation between processors. Examples show that synergy is a useful concept in the design of parallel algorithms.
暂无评论