parallel tree contraction has been found to be a useful and quite powerful tool for the design of a wide class of efficient graph algorithms. We propose a corresponding technique for the parallel solution of increment...
详细信息
ISBN:
(纸本)9780897916714
parallel tree contraction has been found to be a useful and quite powerful tool for the design of a wide class of efficient graph algorithms. We propose a corresponding technique for the parallel solution of incremental problems. As our computational model, we assume a variant of the CRCW PRAM where we can dynamically activate processors by a forking *** consider a dynamic binary tree T of ≤ n nodes and unbounded depth. We describe a procedure, which we call the dynamic parallel tree contraction algorithm, which incrementally processes various parallel modification requests and queries:(1)parallel requests to add or delete leaves of T, or modify labels of internal nodes or leaves of T, and also(2) parallel tree contraction queries which require recomputingvalues at specified nodes. Each modification or query is with respect to a set of nodes U in *** dynamic parallel tree contraction algorithm is a randomized algorithm that takes O(log(|U|log n)) expected parallel time using O(|U|log n/log(|U|log n) processors. We give a large number of applications (with the same bounds), including:(a) maintaining the usual tree properties (such as number of ancestors, preorder, etc.),(b) Eulerian tour,(c) expression evaluation,(d) least common ancestor, and(e) canonical forms of ***, there where no known parallelalgorithms for incrementally maintaining and solving such problems in parallel time less than Θ(log n).In deriving our incremental algorithms, we solve a key subproblem, namely a processor activation problem, within the same asymptotic bounds, which may be useful in the design of other parallel incremental algorithms. This algorithm uses an interesting persistent parallel data structures involving a non-trivial *** a subsequent paper, we apply our dynamic parallel tree contraction technique to various incremental graph problems: maintaining various properties, (such as coloring, minimum covering set, maximum matching, etc.). of parall
We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multi-level storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n reco...
详细信息
ISBN:
(纸本)0898713293
We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multi-level storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n records on an n-processor hypercubic network. In that context, it is not known whether Sharesort is asymptotically optimal. Nonetheless, we find that Sharesort achieves optimal time bounds for parallel sorting in multi-level storage, under a variety of models that have been defined in the literature.
Extensive research has been done on extracting parallelism from single instruction stream processors. This paper presents some results of our investigation into ways to modify MIMD architectures to allow them to extra...
详细信息
ISBN:
(纸本)0897917073
Extensive research has been done on extracting parallelism from single instruction stream processors. This paper presents some results of our investigation into ways to modify MIMD architectures to allow them to extract the instruction level parallelism achieved by current superscalar and VLIW machines. A new architecture is proposed which utilizes the advantages of a multiple instruction stream design while addressing some of the limitations that have prevented MIMD architectures from performing ILP operation. A new code scheduling mechanism is described to support this new architecture by partitioning instructions across multiple processing elements in order to exploit this level of parallelism.
Maisie is among the few languages that separates the simulation program from the underlying algorithm (sequential or parallel) that is used to execute the program. It is thus possible to design a sequential simulation...
详细信息
ISBN:
(纸本)0818656204
Maisie is among the few languages that separates the simulation program from the underlying algorithm (sequential or parallel) that is used to execute the program. It is thus possible to design a sequential simulation and, if needed, to subsequently port it to a parallel machine for execution with optimistic or conservative algorithms. This paper gives an overview of the Maisie simulation environment and presents experimental measurements of a model with both conservative and optimistic parallelalgorithms. We also describe our current work in adding inheritance to the Maisie language.
parallelalgorithms developed for CAD problems today suffer from three important drawbacks. first, they are machine specific and tend to perform poorly on architectures other than the one for which they were designed....
详细信息
ISBN:
(纸本)0818656026
parallelalgorithms developed for CAD problems today suffer from three important drawbacks. first, they are machine specific and tend to perform poorly on architectures other than the one for which they were designed. Second, they cannot use the latest advances in improved versions of the sequential algorithms for solving the problem. Third, the quality of results degrade significantly during parallel execution. In this paper we address these three problems for an important CAD application: standard cell placement. We have developed a new parallel placement algorithm that is portable across a range of MIMD parallelarchitectures. The algorithm is part of the ProperCAD project which allows the development and implementation of a parallel algorithm such that it can be executed on a wide variety of parallel machines without any change to the source. The parallel placement algorithm is based on an existing implementation of the sequential simulated annealing algorithm, TimberWolfSC 6.0 [1].
Known polylog parallelalgorithms for the solution of linear systems and related problems require computation of the characteristic polynomial or related forms, which are known to be highly unstable in practice. Howev...
详细信息
ISBN:
(纸本)9780897916714
Known polylog parallelalgorithms for the solution of linear systems and related problems require computation of the characteristic polynomial or related forms, which are known to be highly unstable in practice. However, matrix factorizations of various types, bypassing computation of the characteristic polynomial, are used extensively in sequential numerical computations and are essential in many *** paper gives new parallel methods for various exact factorizations of several classes of matrices. We assume the input matrices are n × n with either integer entries of size ≤ 2no(1). We make no other assumption on the input. We assume the arithmetic PRAM model of parallel computation. Our main result is a reduction of the known parallel time bounds for these factorizations from O(log3n) to O(log2n). Our results are work efficient; we match the best known work bounds of parallelalgorithms with polylog time bounds, and are within a log n factor of the work bounds for the best known sequential algorithms for the same *** exact factorizations we compute for symmetric positive definite matrices include: 1. recursive factorization sequences and trees,2. LU factorizations,3. QR factorizations, and4. reduction to upper Hessenberg *** classes of matrices for which we can efficiently compute these factorizations include:1. dense matrices, in time O(log2n) with processor bound P(n) (the number of processors needed to multiply two n × n matrices in O(log n time),2. block diagonal matrices, in time O(log2b with P(b)n/b processors,3. sparse matrices which are s(n)-separable (recursive factorizations only), in time O(log2n) with P(s(n)) processors where s(n) is of the form n γ for 0 < γ < 1, and4. banded matrices, in parallel time O((logn) log b) with P(b)n/b *** factorizations also provide us similarly efficient algorithms for exact computation (given arbitrary rational matrices that need not be symmetric positive definite) of the following: 1
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. Goldschlager, Cook & Dymond and others have developed NC2 algorithms to evaluate a sp...
详细信息
ISBN:
(纸本)0898713293
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. Goldschlager, Cook & Dymond and others have developed NC2 algorithms to evaluate a special layered form of a PMC. These algorithms require a large number of processors (Ω(n6), where n is the size of the input circuit). Yang, and more recently, Delcher & Kosaraju have given NC algorithms for the general planar monotone circuit value problem. These algorithms use at least as many processors as the algorithms for the layered case. In this paper, we give an efficient parallel algorithm that evaluates a general PMC of size n in polylog time using only a linear number of processors on an EREW PRAM. This is a substantial improvement over the earlier algorithms for the problem. Our algorithm uses several novel techniques to perform the evaluation, including the use of the dual of the plane embedding of the circuit to derive some crucial properties of the propagation of values within the circuit.
The proceedings contain 38 papers. The topics discussed include: contention-free complexity of shared memory algorithms;making operations of concurrent data types fast;open systems in TLA;delimiting the power of bound...
ISBN:
(纸本)0897916549
The proceedings contain 38 papers. The topics discussed include: contention-free complexity of shared memory algorithms;making operations of concurrent data types fast;open systems in TLA;delimiting the power of bounded size synchronization objects;wait-freedom vs. bounded wait-freedom in public data structures;ENF event predicate detection in distributed systems;repeatable and portable message-passing programs;mixed consistency: a model for parallel programming;using belief to reason about cache coherence;global flush communication primitive communication;multimedia networking: applications and challenges;a performance evaluation of lock-free synchronization protocols;disjoint-access-parallel implementations of strong shared memory primitives;time-optimal message-efficient work performance in the presence of faults;using k-exclusion to implement resilient, scalable shared objects;coins, weights and contention in balancing networks;resilience of general interactive tasks;asynchronous secure computations with optimal resilience;a combinatorial treatment of balancing networks;and memory-efficient and self-stabilizing network reset.
In this paper, we present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n non-intersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time ...
详细信息
ISBN:
(纸本)0897916484
In this paper, we present an optimal parallel randomized algorithm for the Voronoi diagram of a set of n non-intersecting (except possibly at endpoints) line segments in the plane. Our algorithm runs in O(log n) time with very high probability and uses O(n) processors on a CRCW PRAM. This algorithm is optimal in terms of P.T bounds since the sequential time bound for this problem is Ω(n log n). Our algorithm improves by an O(log n) factor the previously best known deterministic parallel algorithm which runs in O(log2n) time using O(n) processors [13]. We obtain this result by using random sampling at 'two stages' of our algorithm and using efficient randomized search techniques. This technique gives a direct optimal algorithm for the Voronoi diagram of points as well (all other optimal parallelalgorithms for this problem use reduction from the 3-d convex hull construction).
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. This paper presents an analysis of a texture segmentation application containing ...
详细信息
ISBN:
(纸本)0818656026
Segmentation and other image processing operations rely on convolution calculations with heavy computational and memory access demands. This paper presents an analysis of a texture segmentation application containing a 96x96 convolution. Sequential execution required several hours on single processors systems with over 99% of the time spent performing the large convolution. 70% to 75% of execution time is attributable to cache misses within the convolution. We implemented the same application on CM-5, iPSC/860 and PVM distributed memory multicomputers, tailoring the parallelalgorithms to each machine's architectures. parallelization significantly reduced execution time, taking 49 second on a 512 node CM-5 and 6.5 minutes on a 32 node iPSC/860.
暂无评论