This paper describes the process used to extend the Boost graph Library (BGL) for parallel operation with distributed memory. The BGL consists of a rich set of generic graphalgorithms and supporting data structures, ...
详细信息
This paper describes the process used to extend the Boost graph Library (BGL) for parallel operation with distributed memory. The BGL consists of a rich set of generic graphalgorithms and supporting data structures, but it was not originally designed with parallelism in mind. In this paper, we revisit the abstractions comprising the BGL in the context of distributed-memory parallelism, lifting away the implicit requirements of sequential execution and a single shared address space. We illustrate our approach by describing the process as applied to one of the core algorithms in the BGL, breadth-first search. The result is a generic algorithm that is unchanged from the sequential algorithm, requiring only the introduction of external (distributed) data structures for parallel execution. More importantly, the generic implementation retains its interface and semantics, such that other distributed algorithms can be built upon it, just as algorithms are layered in the sequential case. By characterizing these extensions as well as the extension process, we develop general principles and patterns for using (and reusing) generic, object-oriented parallel software libraries. We demonstrate that the resulting algorithm implementations are both efficient and scalable with performance results for several algorithms.
This paper describes the process used to extend the Boost graph Library (BGL) for parallel operation with distributed memory. The BGL consists of a rich set of generic graphalgorithms and supporting data structures, ...
详细信息
ISBN:
(纸本)9781595930316
This paper describes the process used to extend the Boost graph Library (BGL) for parallel operation with distributed memory. The BGL consists of a rich set of generic graphalgorithms and supporting data structures, but it was not originally designed with parallelism in mind. In this paper, we revisit the abstractions comprising the BGL in the context of distributed-memory parallelism, lifting away the implicit requirements of sequential execution and a single shared address space. We illustrate our approach by describing the process as applied to one of the core algorithms in the BGL, breadth-first search. The result is a generic algorithm that is unchanged from the sequential algorithm, requiring only the introduction of external (distributed) data structures for parallel execution. More importantly, the generic implementation retains its interface and semantics, such that other distributed algorithms can be built upon it, just as algorithms are layered in the sequential case. By characterizing these extensions as well as the extension process, we develop general principles and patterns for using (and reusing) generic, object-oriented parallel software libraries. We demonstrate that the resulting algorithm implementations are both efficient and scalable with performance results for several algorithms.
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs wit...
详细信息
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. Many PRAM algorithms can be adapted to SMPs with few modifications. Yet there are few studies that deal with the implementation and performance issues of running PRAM-style algorithms on SMPs. Our study in this paper focuses on implementing parallel spanning tree algorithms on SMPs. Spanning tree is an important problem in the sense that it is the building block for many other parallel graph algorithms and also because it is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but these irregular problems often have no known efficient parallel implementations. Experimental studies have been conducted on related problems (minimum spanning tree and connected components) using parallel computers, but only achieved reasonable speedup on regular graph topologies that can be implicitly partitioned with good locality features or on very dense graphs with limited numbers of vertices. In this paper we present a new randomized algorithm and implementation with superior performance that for the first time achieves parallel speedup on arbitrary graphs (both regular and irregular topologies) when compared with the best sequential implementation for finding a spanning tree. This new algorithm uses several techniques to give an expected running time that scales linearly with the number p of processors for suitably large inputs (n > p(2)). As the spanning tree problem is notoriously hard for any parallel implementation to achieve reasonable speedup, our study may shed new light on implementing PRAM algorithms for shared-memory parallel computers. The main results of this paper are 1. A new and practical spanning tree algorithm for symmetric multiprocessors that exhibits parallel speedups on graphs with regular and irr
Hierarchical clustering methods are important in many data mining and pattern recognition tasks. In this paper we present an efficient coarse grained parallel algorithm for Single Link Clustering;a standard inter-clus...
详细信息
ISBN:
(纸本)3540290311
Hierarchical clustering methods are important in many data mining and pattern recognition tasks. In this paper we present an efficient coarse grained parallel algorithm for Single Link Clustering;a standard inter-cluster linkage metric. Our approach is to first describe algorithms for the Prefix Larger Integer Set and the Closest Larger Ancestor problems and then to show how these can be applied to solve the Single Link Clustering problem. In an extensive performance analysis an implementation of these algorithms on a Linux-based cluster has shown to scale well, exhibiting near linear relative speedup.
This paper analyzes what structural features of graph problems allow efficient parallelalgorithms. We survey some parallelalgorithms for typical problems on three kinds of graphs, outerplanar graphs, trapezoid graph...
详细信息
This paper analyzes what structural features of graph problems allow efficient parallelalgorithms. We survey some parallelalgorithms for typical problems on three kinds of graphs, outerplanar graphs, trapezoid graphs and in-tournament graphs. Our results on the shortest path problem, the longest path problem and the maximum flow problem on outerplanar graphs, the minimum-weight connected dominating set problem and the coloring problem on trapezoid graphs and Hamiltonian path and Hamiltonian cycle problem on in-tournament graphs are adopted as working examples.
A concurrent-read exclusive-write PRAM algorithm is developed to find the k shortest paths between pairs of vertices in an edge-weighted directed graph. Repetitions of vertices along the paths are allowed. The algorit...
详细信息
A concurrent-read exclusive-write PRAM algorithm is developed to find the k shortest paths between pairs of vertices in an edge-weighted directed graph. Repetitions of vertices along the paths are allowed. The algorithm computes an implicit representation of the k shortest paths to a given destination vertex from every vertex of a graph with n vertices and in edges, using O (m + nk log(2) k) work and O (log(3) k log* k + log n(log log k + log* n)) time, assuming that a shortest path tree rooted at the destination is pre-computed. The paths themselves can be extracted from the implicit representation in O(log k + log n) time, and O(n log n + L) work, where L is the total length of the output.
Computation of maximal matching of a graph based on Depth First Search tree computation was introduced by Datta and Sen (parallelalgorithms and Applications,5, 1995, 161-164). They showed that the approach gives effi...
详细信息
Computation of maximal matching of a graph based on Depth First Search tree computation was introduced by Datta and Sen (parallelalgorithms and Applications,5, 1995, 161-164). They showed that the approach gives efficient parallelalgorithms for maximal matching for graphs for which DFS tree can be efficiently computed. They also presented a parallel scheme to compute a maximal matching in O(T(n) log n) time using O(P(n)) number of processors, where T(n) and P(n) are the time and the number of processors required to compute DFS tree of a graph in parallel. We present here, an improved technique to compute maximal matching in parallel based on DFS tree computation. The algorithm takes O(T(n)) time and O(P(n)) processors.
The Bulk-Synchronous parallel (BSP) model was proposed by Valiant as a standard interface between parallel software and hardware. In theory. the BSP model has been shown to allow the asymptotically optimal execution o...
详细信息
The Bulk-Synchronous parallel (BSP) model was proposed by Valiant as a standard interface between parallel software and hardware. In theory. the BSP model has been shown to allow the asymptotically optimal execution of architecture-independent software on a variety of architectures. Our goal in this work is to experimentally examine the practical use of the BSP model on current parallel architectures. We describe the design and implementation of the Green BSP Library, a small library of functions that implement the BSP model, and of several applications that were written for this library. We then discuss the performance of the library and application programs on several parallel architectures. Our results are positive in that we demonstrate efficiency and portability over a range of parallel architectures and show that the BSP cost model is useful for predicting performance trends and estimating execution times.
We present a new parallel algorithm for finding a maximal matching of a graph. The time required by our algorithm is O(TD(n)log n) and the number of processors used is PD(n), where TD(n) and PD(n) are the time and num...
详细信息
In this paper,a sequential algorithm computing the all vertex pair distance matrix D and the path matrix Pis *** a PRAM EREW model with p,1≤p≤n^2,processors,a parallel version of the sequential algorithm is *** meth...
详细信息
In this paper,a sequential algorithm computing the all vertex pair distance matrix D and the path matrix Pis *** a PRAM EREW model with p,1≤p≤n^2,processors,a parallel version of the sequential algorithm is *** method can also be used to get a parallel algorithm to compute transitive closure arrayof an undirected *** time complexify of the parallel algorithm is O(n^3/p).If D,P andare known,it is shown that the problems to find all connected components, to compute the diameter of an undirected graph,to determine the center of a directed graph and to search for a directed cycle with the minimum(maximum)length in a directed graph can all be solved in O(n^2/p^+ logp)time.
暂无评论