Consider the following NP-hard problems: Given a graph G, find the minimum 2-edge connected and 2-vertex connected subgraphs spanning all vertices of G. The past few years have produced exciting sequential algorithms ...
详细信息
ISBN:
(纸本)9780897917179
Consider the following NP-hard problems: Given a graph G, find the minimum 2-edge connected and 2-vertex connected subgraphs spanning all vertices of G. The past few years have produced exciting sequential algorithms for approximating such minimum subgraphs [6, 7]. The approximation factors are improved from 2 down to 5/4 and 3/2 respectively. Yet the techniques involved are all based on augmenting depth-first-search trees and no similar progress has been carried to the parallel context. This paper presents NC algorithms to achieve approximation factors of 3/2 + Ε and 7/4 + Ε respectively without computing depth-first-search trees.
This paper provides a new approach to labeling the connected components of an n × n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaran...
详细信息
ISBN:
(纸本)9780897917179
This paper provides a new approach to labeling the connected components of an n × n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaranteed to complete in o(n lg n) time as well as algorithms likely to approach O(n) time for all or most images. The best previous solutions require using a more complicated architecture or require Ω(n lg n) time. We also show that on a restricted version of the architecture, any algorithm requires Ω(n lg n) time in the worst case.
This paper addresses the question of the future applicability of bus-based multiprocessors, by studying the execution characteristics of a number of data parallel numerical and scientific applications, as the system p...
详细信息
ISBN:
(纸本)9780897917179
This paper addresses the question of the future applicability of bus-based multiprocessors, by studying the execution characteristics of a number of data parallel numerical and scientific applications, as the system parameters of processor speed, bus bandwidth, and cache size are scaled (holding the number of processors fixed). This study focuses on 'asymptotic' results. Specifically, it addresses whether the application eventually become communication bound as the machine and the application are scaled. Essentially, it addresses the question of whether the machine becomes unsuitable as a parallel computing platform for the application. A transient analysis on the rate which the asymptotic results take hold is also performed.
Techniques for quickly executing lengthy computations by the use of molecular parallelism are described. It is demonstrated that molecular computations can be done using short DNA strands by more or less conventional ...
详细信息
ISBN:
(纸本)9780897917179
Techniques for quickly executing lengthy computations by the use of molecular parallelism are described. It is demonstrated that molecular computations can be done using short DNA strands by more or less conventional biotechnology engineering techniques within a small number of laboratory steps. Two abstract models of molecular computation are proposed. The first, the parallel Associative Memory Model, is a very high level model which includes a parallel Associative Matching operation, that appears to improve the power of molecular parallelism beyond the operations previously considered by Lipton (1994). A Recombinant DNA Model is also proposed which is a low level model that allows operations that are abstractions of very well understood recombinant DNA operations and provides a representation, herein called complex, for the relevant structural properties of DNA.
Mixed task and data parallelism exists naturally in many applications, but utilizing it may require sophisticated scheduling algorithms and software support. Recently, significant research effort has been applied to e...
详细信息
ISBN:
(纸本)9780897917179
Mixed task and data parallelism exists naturally in many applications, but utilizing it may require sophisticated scheduling algorithms and software support. Recently, significant research effort has been applied to exploiting mixed parallelism in both theory and systems communities. In this paper, we ask how much mixed parallelism will improve performance in practice, and how architectural evolution impacts these estimates. First, we build and validate a performance model for a class of mixed task and data parallel problems based on machine and problem parameters. second, we use this model to estimate the gains from mixed parallelism for some scientific applications on current machines. This quantifies our intuition that mixed parallelism is best when either communication is slow or the number of processors is large. Third, we show that, for balanced divide and conquer trees, a simple one-time switch between data and task parallelism gets most of the benefit of general mixed parallelism. Fourth, we establish upper bounds to the benefits of mixed parallelism for irregular task graphs. Apart from these detailed analyses, we provide a framework in which other applications and machines can be evaluated.
The original LogP model for parallel computation use four parameters;the communication latency (L), overhead (n), bandwidth (g), and the number of processors (P). This basic model is extended with a linear model for l...
详细信息
The original LogP model for parallel computation use four parameters;the communication latency (L), overhead (n), bandwidth (g), and the number of processors (P). This basic model is extended with a linear model for long messages. The resulting combination which is called the LogGP model for parallel computation has one additional parameter, G, which captures the bandwidth obtained for long messages. In this paper, the algorithm design and analysis under the new model are discussed, and the all-to-all remap, FFT, and radix sort are examined. In addition, the single node scatter problem is, examined. Finally, solutions are derived for the problem, and the their optimality under the LogGP model is proven.
Most high-level parallel programming languages allow for fine-grained parallelism. Programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to ...
详细信息
ISBN:
(纸本)9780897917179
Most high-level parallel programming languages allow for fine-grained parallelism. Programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. When executing such programs, the major concern is to dynamically schedule tasks to processors in order to minimize execution time and the amount of memory needed. In this paper, a class of parallel schedules that are provably efficient in both time and space, even for programs whose task structure is revealed only during execution are identified. Following this, an efficient dynamic scheduling algorithm that generates schedules in this class, for languages with nested fine-grained parallelism is described.
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p...
详细信息
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p ≥ p2+Ε (for some arbitrarily small Ε > 0). For any given set of n points in 3-space, the algorithm computes the 3D convex hull, with high probability, in O(n log n÷p) local computation time and O(1) communication phases with at most O(n÷p) data sent/received by each processor. That is, with high probability, the algorithm computes the 3D convex hull of an arbitrary point set in time O(n log n÷p + Γn,p), where Γn,p denotes the time complexity of one communication phase. In the terminology of the BSP model, our algorithm requires, with high probability, O(1) supersteps and a synchronization period Θ(n log n÷p). In the LogP model, the execution time of our algorithm is asymptotically optimal for several architectures.
The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinatorial element has bound...
详细信息
ISBN:
(纸本)9780897917179
The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinatorial element has bounded fan-in and fan-out and can be evaluated in constant time. This problem is easily solved on an ordinary serial computer in O(W + D) time, where W is the number of elements in the altered subcircuit and D is the subcircuit's embedded depth (its depth measured in the original circuit). In this paper, we show how to solve the circuit value update problem efficiently on a P-processor parallel computer. We give a straightforward synchronous, parallel algorithm that runs in O(W/P + D lg P) expected time. Our main contribution, however, is an optimistic, asynchronous, parallel algorithm that runs in O(W/P + D + lg W + lg P) expected time, where W and D are the size and embedded depth, respectively, of the 'volatile' subcircuit, the subcircuit of elements that have inputs which either change or glitch as a result of the update. To our knowledge, our analysis provides the first analytical bounds on the running time of an optimistic algorithm.
In this paper, we address the question how efficiently a single constant-degree processor network can simulate the computation of any constant-degree processor network. We show the following lower bound trade-off: If ...
详细信息
In this paper, we address the question how efficiently a single constant-degree processor network can simulate the computation of any constant-degree processor network. We show the following lower bound trade-off: If M is an arbitrary constant-degree processor network of size m that can simulate all constant-degree processor networks of size n with slowdown s, then m·s = Ω(n log m). Our trade-off holds for a very general model of simulations. It covers all previously considered models and all known techniques for simulations among networks. For m ≥ n, this improves a previous lower bound by a factor of log log n, proved for a weaker simulation model. For m < n, this is the first non-trivial lower bound for this problem. In this case, this lower bound is asymptotically tight.
暂无评论