A parallel algorithm for solving the 2D shallow water equations coupled with the convection-diffusion equation has been developed, in order to demonstrate the capability and performance of our parallel approach while ...
详细信息
In practice symmetries of combinatorial structures are computed by transforming the structure into an annotated graph whose automorphisms correspond exactly to the desired symmetries. An automorphism solver is then em...
详细信息
Hash tables are a fundamental data structure for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-...
详细信息
Hash tables are a fundamental data structure for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. This survey identifies key factors affecting the performance of different techniques and suggests directions for further research.
The stability of a social network has been widely studied as an important indicator for both the network holders and the participants. Existing works on reinforcing networks focus on a local view, e.g., the anchored k...
详细信息
One of the simplest problems on directed graphs is that of identifying the set of vertices reachable from a designated source vertex. This problem can be solved easily sequentially by performing a graph search, but ef...
详细信息
One of the simplest problems on directed graphs is that of identifying the set of vertices reachable from a designated source vertex. This problem can be solved easily sequentially by performing a graph search, but efficient parallel algorithms have eluded researchers for decades. For sparse high-diameter graphs in particular, there is no known work-efficient parallel algorithm with nontrivial parallelism. This amounts to one of the most fundamental open questions in parallel graph algorithms: Is there a parallel algorithm for digraph reachability with nearly linear work? This article shows that the answer is yes, presenting a randomized parallel algorithm for digraph reachability and related problems with expected work o(m) and span (O) over tilde (n(2/3)), and hence parallelism (O) over tilde (m/n(2/3)) = (Omega) over tilde (n(1/3)), on any graph with n vertices and m arcs. This is the first parallel algorithm having both nearly linear work and strongly sublinear span, i.e., span (O) over tilde (n(1-is an element of)) for any constant is an element of > 0. The algorithm can be extended to produce a directed spanning tree, determine whether the graph is acyclic, topologically sort the strongly connected components of the graph, or produce a directed ear decomposition, all with work (O) over tilde (m) and span (O) over tilde (n(2/3)). The main technical contribution is an efficient Monte Carlo algorithm that, through the addition of a(n) shortcuts, reduces the diameter of the graph to (O) over tilde (n(2/3)) with high probability. While both sequential and parallel algorithms are known with those combinatorial properties, even the sequential algorithms are not efficient, having sequential runtime Omega(mn(Omega(1))). This article presents a surprisingly simple sequential algorithm that achieves the stated diameter reduction and runs in (O) over tilde (m) time. parallelizing that algorithm yields the main result, but doing so involves overcoming several other challen
In this paper, we present an algorithm using the GPGPU machine to compute the interval solutions of isolated real zeros of multivariate polynomial functions in given ranges. To overcome the state space explosion in th...
详细信息
The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HL...
详细信息
The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification a standard framework for interoperability among simulators includes a Data Distribution Management (DDM) service whose responsibility is to report all intersections between a set of subscription and update regions. The algorithms at the core of the DDM service are CPU-intensive, and could greatly benefit from the large computing power of modern multi-core processors. In this article, we propose two parallel solutions to the DDM problem that can operate effectively on shared-memory multiprocessors. The first solution is based on a data structure (the interval tree) that allows concurrent computation of intersections between subscription and update regions. The second solution is based on a novel parallel extension of the Sort Based Matching algorithm, whose sequential version is considered among the most efficient solutions to the DDM problem. Extensive experimental evaluation of the proposed algorithms confirm their effectiveness on taking advantage of multiple execution units in a shared-memory architecture.
Edit distance has applications in many domains such as bioinformatics, spell checking, plagiarism checking, query optimization, speech recognition, and data mining. Traditionally, edit distance is computed by dynamic ...
详细信息
Edit distance has applications in many domains such as bioinformatics, spell checking, plagiarism checking, query optimization, speech recognition, and data mining. Traditionally, edit distance is computed by dynamic programming based sequential solution which becomes infeasible for large problems. In this paper, we introduce NvPD, a novel algorithm for parallel edit distance computation by resolving dependencies in the conventional dynamic programming based solution. We also establish the correctness of modified dependencies. NvPD exhibits certain characteristics such as balanced workload among processors, less synchronization overhead, maximum utilization of resources and it can exploit spatial locality. It requiresmin(m,n)steps to complete as compared to diagonal based approach that completes inmax(m,n) Experimental evaluation using variety of random and real life data sets over shared memory multi-core systems and graphic processing units (GPUs) show that NvPD outperforms state-of-the-art parallel edit distance algorithms.
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past de...
详细信息
Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights into how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.
Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of...
详细信息
Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of non-overlapping rectangular areas with object level, called blocks. The IBR permits the execution of operations on image areas instead of image points and therefore leads to a substantial reduction of the required computational complexity. The experimental and the analytically derived results from parallel implementation in OpenMP, on a multicore computer, proved that a very good overall performance can be achieved. (C) 2019 Elsevier Inc. All rights reserved.
暂无评论