We present the firstm polylog(n) work, polylog(n) time algorithm in the PRAM model that computes (1 + epsilon)-approximate single-source shortest paths on weighted, undirected graphs. This improves upon the breakthrou...
详细信息
ISBN:
(纸本)9781450369794
We present the firstm polylog(n) work, polylog(n) time algorithm in the PRAM model that computes (1 + epsilon)-approximate single-source shortest paths on weighted, undirected graphs. This improves upon the breakthrough result of Cohen [JACM'00] that achieves O(m(1+epsilon 0)) work and polylog(n) time. While most previous approaches, including Cohen's, leveraged the power of hopsets, our algorithm builds upon the recent developments in continuous optimization, studying the shortest path problem from the lens of the closely-related minimum transshipment problem. To obtain our algorithm, we demonstrate a series of near-linearwork, polylogarithmic-time reductions between the problems of approximate shortest path, approximate transshipment, and l(1)-embeddings, and establish a recursive algorithm that cycles through the three problems and reduces the graph size on each cycle. As a consequence, we also obtain faster parallel algorithms for approximate transshipment and l(1)-embeddings with polylogarithmic distortion. The minimum transshipment algorithm in particular improves upon the previous best m(1+o(1)) work sequential algorithm of Sherman [SODA'17]. To improve readability, the paper is almost entirely self-contained, save for several staple theorems in algorithms and combinatorics.
This paper presents multilevel iterative schemes for solving the multigroup Boltzmann transport equations (BTEs) with parallel calculation of group equations. They are formulated with multigroup and grey low-order equ...
详细信息
The Morse-Smale complex is a well studied topological structure that represents the gradient flow behavior of a scalar function. It supports multi-scale topological analysis and visualization of large scientific data....
详细信息
ISBN:
(纸本)9781728180144
The Morse-Smale complex is a well studied topological structure that represents the gradient flow behavior of a scalar function. It supports multi-scale topological analysis and visualization of large scientific data. Its computation poses significant algorithmic challenges when considering large scale data and increased feature complexity. Several parallel algorithms have been proposed towards the fast computation of the 3D Morse-Smale complex. The non-trivial structure of the saddle-saddle connections are not amenable to parallel computation. This paper describes a fine grained parallel method for computing the Morse-Smale complex that is implemented on a GPU. The saddle-saddle reachability is first determined via a transformation into a sequence of vector operations followed by the path traversal, which is achieved via a sequence of matrix operations. Computational experiments show that the method achieves up to 7 x speedup over current shared memory implementations.
In this work, we give a new parallel algorithm for the problem of maximizing a non-monotone diminishing returns submodular function subject to a cardinality constraint. For any desired accuracy epsilon, our algorithm ...
详细信息
ISBN:
(纸本)9781713821120
In this work, we give a new parallel algorithm for the problem of maximizing a non-monotone diminishing returns submodular function subject to a cardinality constraint. For any desired accuracy epsilon, our algorithm achieves a 1/epsilon - epsilon approximation using O(log n log(1/epsilon)epsilon(3)) parallel rounds of function evaluations. The approximation guarantee nearly matches the best approximation guarantee known for the problem in the sequential setting and the number of parallel rounds is nearly-optimal for any constant epsilon. Previous algorithms achieve worse approximation guarantees using Omega(log(2) n) parallel rounds. Our experimental evaluation suggests that our algorithm obtains solutions whose objective value nearly matches the value obtained by the state of the art sequential algorithms, and it outperforms previous parallel algorithms in number of parallel rounds, iterations, and solution quality.
When working on graphs, reachability is among the most common problems to address, since it is the base for many other algorithms. As with the advent of distributed systems, which process large amounts of data, many a...
详细信息
ISBN:
(纸本)9789897584435
When working on graphs, reachability is among the most common problems to address, since it is the base for many other algorithms. As with the advent of distributed systems, which process large amounts of data, many applications must quickly explore graphs with millions of vertices, scalable solutions have become of paramount importance. Modern GPUs provide highly parallel systems based on many-core architectures and have gained popularity in parallelizing algorithms that run on large data sets. In this paper, we extend a very efficient state-of-the-art graph-labeling method, namely the GRAIL algorithm, to architectures which exhibit a great amount of data parallelism, i.e., many-core CUDA-based GPUs. GRAIL creates a scalable index for answering reachability queries, and it heavily relies on depth-first searches. As depth-first visits are intrinsically recursive and they cannot be efficiently implemented on parallel systems, we devise an alternative approach based on a sequence of breadth-first visits. The paper explores our efforts in this direction, and it analyzes the difficulties encountered and the solutions chosen to overcome them. It also presents a comparison (in terms of times to create the index and to use it for reachability queries) between the CPU and the GPU-based versions.
Data partitioning algorithms aiming to minimize the execution time and the energy of computations in self-adaptable data-parallel applications on modern extreme-scale multicore platforms must address two critical chal...
详细信息
Data partitioning algorithms aiming to minimize the execution time and the energy of computations in self-adaptable data-parallel applications on modern extreme-scale multicore platforms must address two critical challenges. First, they must take into account the new complexities inherent in these platforms such as severe resource contention and non-uniform memory access. Second, they must have low practical runtime and memory costs. The sequential data partitioning algorithms addressing the first challenge have a theoretical time complexity of O(m*m*p*p) where m is the number of points in the discrete speed/energy function and p is the number of available processors. They, however, exhibit high practical runtime cost and excessive memory footprint, therefore, rendering them impracticable for employment in self-adaptable applications executing on extreme-scale multicore platforms. We present, in this paper, the parallel data partitioning algorithms that address both the challenges. They take as input the functional models of performance and energy consumption against problem size and output workload distributions, which are globally optimal solutions. They have a low time complexity of O(m*m*p) thereby providing a linear speedup of O(p) and low memory complexity of O(n) where n is the workload size expressed as a multiple of granularity. They employ dynamic programming approach, which also facilitates the easier integration of performance and energy models of communications. We experimentally study the practical cost of application of our algorithms in two data-parallel applications, matrix multiplication and fast Fourier transform, on a cluster in Grid'5000 platform. We demonstrate that their practical runtime and memory costs are low making them ideal for employment in self-adaptable applications. We also show that the parallel algorithms exhibit tremendous speedups over the sequential algorithms. Finally, using theoretical analysis for a forecast exascale platfor
With the arrival of the intermodality era, to design fast and efficient K shortest paths(KSP) algorithms becomes gradually one of the core technologies in traveler information systems. Yen is a classical algorithm for...
详细信息
ISBN:
(纸本)9783038350194
With the arrival of the intermodality era, to design fast and efficient K shortest paths(KSP) algorithms becomes gradually one of the core technologies in traveler information systems. Yen is a classical algorithm for KSP. However, Yen is time-consuming. In view of powerful general-purpose computing capabilities, GPU(Graphics Processing Units) has received increasing attention in various fields. Based on CUDA software development environment, combined with the structure of the Yen algorithm itself, this paper proposed two parallel algorithms for Yen. The first parallel algorithm computes candidate shortest paths for very possible deviation nodes in parallel. The second one computes candidate shortest paths in serial, but computes very candidate path in parallel. Finally, the efficiency of the two parallel algorithms was tested through experiments.
We present an algorithm for constructing a depth-first search tree in planar digraphs;the algorithm can be implemented in the complexity class AC1(UL ∩ co-UL), which is contained in AC2. Prior to this (for more than ...
详细信息
We consider networks of small, autonomous devices that communicate with each other wirelessly. Minimizing energy usage is an important consideration in designing algorithms for such networks, as battery life is a cruc...
详细信息
In this work we present parallel algorithms based on the use of two-stage methods for solving the PageRank problem as a linear system. Different parallel versions of these methods are explored and their convergence pr...
详细信息
In this work we present parallel algorithms based on the use of two-stage methods for solving the PageRank problem as a linear system. Different parallel versions of these methods are explored and their convergence properties are analyzed. The parallel implementation has been developed using a mixed MPI/OpenMP model to exploit parallelism beyond a single level. In order to investigate and analyze the proposed parallel algorithms, we have used several realistic large datasets. The numerical results show that the proposed algorithms can speed up the time to converge with respect to the parallel Power algorithm and behave better than other well-known techniques.
暂无评论