We present a (1 + epsilon) -approximate parallel algorithm for computing shortest paths in undirected graphs, achieving poly(log n) depth and mpoly(log n) work for n-nodes m-edges graphs. Although sequential algorithm...
详细信息
ISBN:
(纸本)9781450369794
We present a (1 + epsilon) -approximate parallel algorithm for computing shortest paths in undirected graphs, achieving poly(log n) depth and mpoly(log n) work for n-nodes m-edges graphs. Although sequential algorithms with (nearly) optimal running time have been known for several decades, near-optimal parallel algorithms have turned out to be a much tougher challenge. For (1 + epsilon) -approximation, all prior algorithms with poly(log n) depth perform at least Omega(mn(c)) work for some constant c > 0. Improving this long-standing upper bound obtained by Cohen (STOC'94) has been open for 25 years. We develop several new tools of independent interest. One of them is a new notion beyond hopsets - low hop emulator - a poly(log n)-approximate emulator graph in which every shortest path has at most O(log log n) hops (edges). Direct applications of the low hop emulators are parallel algorithms for poly(log n)-approximate single source shortest path (SSSP), Bourgain's embedding, metric tree embedding, and low diameter decomposition, all with poly(log n) depth and mpoly(log n) work. To boost the approximation ratio to (1 + epsilon), we introduce compressible preconditioners and apply it inside Sherman's framework (SODA'17) to solve the more general problem of uncapacitated minimum cost flow (a.k.a., transshipment problem). Our algorithm computes a (1 + epsilon)-approximate uncapacitated minimum cost flow in poly(log n) depth using mpoly(log n) work. As a consequence, it also improves the state-of-the-art sequential running time from m . 2(O(root log n)) to mpoly(log n).
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. Th...
详细信息
ISBN:
(纸本)9781450328210
The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core. In this paper, we analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the k-core is empty with high probability. We show that, with high probability, below this threshold, only 1/log ((k-1)(r-1)) log logn + O(1) rounds of peeling are needed to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show that above this threshold, Omega(logn) rounds of peeling are required to find the non-empty k-core. Since most algorithms and data structures aim to peel to an empty kcore, this asymmetry appears fortunate. We verify the theoretical results both with simulation and with a parallel implementation using graphics processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice.
A novel parallel solver based on the adaptive integral method (AIM) is proposed for the electromagnetic analysis of electrical interconnects in layered media. We show that graph partitioning techniques can be used to ...
详细信息
ISBN:
(纸本)9781728161617
A novel parallel solver based on the adaptive integral method (AIM) is proposed for the electromagnetic analysis of electrical interconnects in layered media. We show that graph partitioning techniques can be used to optimally distribute, across thousands of processes, the computations related to both matrix filling and system solution. The proposed workload distribution strategy is compared to existing techniques through a scalability study on a large realistic interposer model in layered media.
We present the first parallel fixed-parameter algorithm for subgraph isomorphism in planar graphs, bounded-genus graphs, and, more generally, all minor-closed graphs of locally bounded treewidth. Our randomized low de...
详细信息
ISBN:
(纸本)9781450369350
We present the first parallel fixed-parameter algorithm for subgraph isomorphism in planar graphs, bounded-genus graphs, and, more generally, all minor-closed graphs of locally bounded treewidth. Our randomized low depth algorithm has a near-linear work dependency on the size of the target graph. Existing low depth algorithms do not guarantee that the work remains asymptotically the same for any constant-sized pattern. By using a connection to certain separating cycles, our subgraph isomorphism algorithm can decide the vertex connectivity of a planar graph (with high probability) in asymptotically near-linear work and poly-logarithmic depth. Previously, no sub-quadratic work and poly-logarithmic depth bound was known in planar graphs (in particular for distinguishing between four-connected and five-connected planar graphs).
We present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition. Our r-way divide-...
详细信息
We present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition. Our r-way divide-and-conquer algorithm, for a fixed natural number r >= 2, performs Theta (n(3)) work, achieves Theta (n(logr(2r-1))) span, and incurs O(n(3)/(BM) + (n(2)/B)log root M) serial cache misses for n > gamma M, and incurs O ((n(2)/B)log(n/root M)) serial cache misses for alpha root M < n <= gamma M, where, M is the cache size, B is the cache line size, and alpha and gamma are constants. Published by Elsevier B.V.
In this article we consider parallel numerical algorithms to solve the 3D mathematical model, that describes a wave propagation in rectangular waveguide. The main goal is to formulate and analyze a minimal algorithmic...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In this article we consider parallel numerical algorithms to solve the 3D mathematical model, that describes a wave propagation in rectangular waveguide. The main goal is to formulate and analyze a minimal algorithmic template to solve this problem by using the CUDA platform. This template is based on explicit finite difference schemes obtained after approximation of systems of differential equations on the staggered grid. The parallelization of the discrete algorithm is based on the domain decomposition method. The theoretical complexity model is derived and the scalability of the parallel algorithm is investigated. Results of numerical simulations are presented.
Aiming at the problem of poor control stability of traditional aircraft control systems, the longitudinal anti-disturbance control system based on the distributed parallel algorithm was designed. Based on the hardware...
详细信息
ISBN:
(数字)9781728164793
ISBN:
(纸本)9781728164793
Aiming at the problem of poor control stability of traditional aircraft control systems, the longitudinal anti-disturbance control system based on the distributed parallel algorithm was designed. Based on the hardware of the original control system, the anti-disturbance control system was designed. And the software part of the aircraft longitudinal anti-disturbance control system was designed. The longitudinal model of aircraft was established, and the active disturbance rejection controller (ADRC) was also designed according to the model. Through the use of distributed parallel algorithms to set the parameters of ADRC, thus completing the design of the vertical ADRC system The comparison experiment with the traditional PD-based aircraft control system shows that the design of the control system based on distributed parallel algorithm has the characteristics of less overshoot, good stability and broad application prospects.
This paper describes a new technique for parallelizing protein clustering, an important bioinformatics computation for the analysis of protein sequences. Protein clustering identifies groups of proteins that are simil...
详细信息
ISBN:
(纸本)9781450380751
This paper describes a new technique for parallelizing protein clustering, an important bioinformatics computation for the analysis of protein sequences. Protein clustering identifies groups of proteins that are similar because they share long sequences of similar amino acids. Given a collection of protein sequences, clustering can significantly reduce the computational effort required to identify all similar sequences by avoiding many negative comparisons. The challenge, however, is to build a clustering that misses as few similar sequences (or elements, more generally) as possible. In this paper, we introduce precise clustering, a property that requires each pair of similar elements to appear together in at least one cluster. We show that transitivity in the data can be leveraged to merge clusters while maintaining a precise clustering, providing a basis for independently forming clusters. This allows us reformulate clustering as a bottom-up merge of independent clusters in a new algorithm called ClusterMerge. ClusterMerge exposes parallelism, enabling fast and scalable implementations. We apply ClusterMerge to find similar amino acid sequences in a collection of proteins. ClusterMerge identifies 99.8% of similar pairs found by a full O(n(2)) comparison, with only half as many comparisons. More importantly, ClusterMerge is highly amenable to parallel and distributed computation. Our implementation achieves a speedup of 604 times on 768 cores (1400 times faster than a comparable single-threaded clustering implementation), a strong scaling efficiency of 90%, and a weak scaling efficiency of nearly 100%.
We present the firstm polylog(n) work, polylog(n) time algorithm in the PRAM model that computes (1 + epsilon)-approximate single-source shortest paths on weighted, undirected graphs. This improves upon the breakthrou...
详细信息
ISBN:
(纸本)9781450369794
We present the firstm polylog(n) work, polylog(n) time algorithm in the PRAM model that computes (1 + epsilon)-approximate single-source shortest paths on weighted, undirected graphs. This improves upon the breakthrough result of Cohen [JACM'00] that achieves O(m(1+epsilon 0)) work and polylog(n) time. While most previous approaches, including Cohen's, leveraged the power of hopsets, our algorithm builds upon the recent developments in continuous optimization, studying the shortest path problem from the lens of the closely-related minimum transshipment problem. To obtain our algorithm, we demonstrate a series of near-linearwork, polylogarithmic-time reductions between the problems of approximate shortest path, approximate transshipment, and l(1)-embeddings, and establish a recursive algorithm that cycles through the three problems and reduces the graph size on each cycle. As a consequence, we also obtain faster parallel algorithms for approximate transshipment and l(1)-embeddings with polylogarithmic distortion. The minimum transshipment algorithm in particular improves upon the previous best m(1+o(1)) work sequential algorithm of Sherman [SODA'17]. To improve readability, the paper is almost entirely self-contained, save for several staple theorems in algorithms and combinatorics.
The Morse-Smale complex is a well studied topological structure that represents the gradient flow behavior of a scalar function. It supports multi-scale topological analysis and visualization of large scientific data....
详细信息
ISBN:
(纸本)9781728180144
The Morse-Smale complex is a well studied topological structure that represents the gradient flow behavior of a scalar function. It supports multi-scale topological analysis and visualization of large scientific data. Its computation poses significant algorithmic challenges when considering large scale data and increased feature complexity. Several parallel algorithms have been proposed towards the fast computation of the 3D Morse-Smale complex. The non-trivial structure of the saddle-saddle connections are not amenable to parallel computation. This paper describes a fine grained parallel method for computing the Morse-Smale complex that is implemented on a GPU. The saddle-saddle reachability is first determined via a transformation into a sequence of vector operations followed by the path traversal, which is achieved via a sequence of matrix operations. Computational experiments show that the method achieves up to 7 x speedup over current shared memory implementations.
暂无评论