Diffracting trees are an effective and highly scalable distributed-parallel technique for shared counting and load balancing. This paper presents the first steady-state combinatorial model and analysis for diffracting...
详细信息
Diffracting trees are an effective and highly scalable distributed-parallel technique for shared counting and load balancing. This paper presents the first steady-state combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our model is simple and sufficiently high level to overcome many implementation specific details, and yet as we will show it is rich enough to accurately predict empirically observed behaviors. As a result of our analysis we were able to identify starvation problems in the original diffracting tree algorithm and modify it to a create a more stable version. We are also able to identify the range in which the diffracting tree performs most efficiently, and the ranges in which its performance degrades. We believe our model and modeling approach open the way to steady-state analysis of other distributed-parallel structures such am counting networks and elimination trees.
We present two asymptotically optimal Θ(n) time algorithms for labeling the connected components of a gray-scale image on a mesh-connected computer. We assume that the input is an n × n gray-scale image mapped o...
详细信息
ISBN:
(纸本)089791483X
We present two asymptotically optimal Θ(n) time algorithms for labeling the connected components of a gray-scale image on a mesh-connected computer. We assume that the input is an n × n gray-scale image mapped one pixel per processor onto an n × n mesh-connected computer. Our algorithms label the components so that every component is connected, the maximum difference in the gray-scale values of the pixels within any component does not exceed a given value, and no component can be merged with a neighboring component. The first algorithm is based on a divide-and-conquer approach. Although it is simple, this algorithm has the potential drawback of possibly assigning two adjacent pixels with the same gray-scale value to different components. The second algorithm avoids this potential drawback, and exploits the ability of a mesh-connected computer to efficiently determine a maximal independent set of a planar graph.
We give two optimal parallelalgorithms for constructing the arrangement of n lines in the plane. The first method is quite simple and runs in O(log2n) time using O(n2) work, and the second method, which is more sophi...
详细信息
We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log(2) n) multi-source parallel reachability querie...
详细信息
ISBN:
(纸本)9781595939739
We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log(2) n) multi-source parallel reachability queries;i.e. O(log(2) n) calls to a subroutine that computes the union of the descendants of a given set of vertices in a given digraph. Our algorithm also topologically sorts the strongly connected components. Using Ullman and Yannakakis's [22] techniques for the reachability subroutine gives our algorithm runtime (O) over tilde (t) using mn/t(2) processors for any (n(2)/m)(1/3) <= t <= n. On sparse graphs, this improves the number of processors needed to compute strongly connected components and topological sort within time n(1/3) <= t <= n from the previously best known (n/t)(3) [20] to (n/t)(2).
Motivated from parallel network mapping, we provide efficient query complexity and round complexity bounds for graph reconstruction using distance queries, including a bound that improves a previous sequential complex...
详细信息
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style ...
详细信息
ISBN:
(纸本)9781581134094
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style algorithms;(2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access patterns, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.
In this paper we present a coarse-grained parallel algorithm for solving the string edit distance problem for a string A and all substrings of a string C. Our method is based on a novel CGM/BSP parallel dynamic progra...
详细信息
In this paper we present a coarse-grained parallel algorithm for solving the string edit distance problem for a string A and all substrings of a string C. Our method is based on a novel CGM/BSP parallel dynamic programming technique for computing all highest scoring paths in a weighted grid graph. The algorithm requires log p rounds/supersteps and O(p/n2 log m) local computation, where p is the number of processors, p2 ≤ m ≤ n. To our knowledge, this is the first efficient CGM/BSP algorithm for the alignment of all substrings of C with A. Furthermore, the CGM/BSP parallel dynamic programming technique presented is of interest in its own right and we expect it to lead to other parallel dynamic programming methods for the CGM/BSP.
When a leveled hypercube algorithm (one dimension used at a time) is mapped in the straightforward way into a hypercube in which all edges are useable at once, most of the host machine's bandwidth is unused. This ...
详细信息
ISBN:
(纸本)089791483X
When a leveled hypercube algorithm (one dimension used at a time) is mapped in the straightforward way into a hypercube in which all edges are useable at once, most of the host machine's bandwidth is unused. This paper shows how to construct embeddings which better utilize the host's edges. In particular, I show how to map an n-dimensional leveled hypercube algorithm into an n-dimensional host hypercube so that the communication throughput of every guest edge is Θ ((n/log n)log(6)2) = ω (n0.386) times the communication throughput of a host edge. Furthermore, the routing can be done on edge-disjoint paths of length at most n. This result can be applied to other algorithms that are run on hypercubes. For example, if an algorithm runs on a mesh with a axes each of length 2l, but uses only one axis at a time, then it can be embedded in an la-dimensional hypercube so that each mesh edge has throughput Θ (l(a/log a)log(6)2).
The proceedings contain 68 papers. The topics discussed include: faster deterministic all pairs shortest paths in congest model;contention resolution with message deadlines;memory tagging: minimalist synchronization f...
ISBN:
(纸本)9781450369350
The proceedings contain 68 papers. The topics discussed include: faster deterministic all pairs shortest paths in congest model;contention resolution with message deadlines;memory tagging: minimalist synchronization for scalable concurrent data structures;work-efficient batch-incremental minimum spanning trees with applications to the sliding-window model;closing the gap between cache-oblivious and cache-adaptive analysis;optimal resource allocation for elastic and inelastic jobs;optimal parallelalgorithms in the binary-forking model;randomized incremental convex hull is highly parallel;almost universal anonymous rendezvous in the plane;the online multi-commodity facility location problem;unconditional lower bounds for adaptive massively parallel computation;on the hardness of massively parallel computation;and graph sparsification for derandomizing massively parallel computation with low space.
Let iT be a bipartite graph with bipartition (A, B) where \A\-n and every subset X of A with at most a n elements has at least b\X\ neighbors (o 1). We consider the problem of computing a matching from a given subset...
详细信息
暂无评论