This paper presents parallel algorithms for the construction of k dimensional tree (KD-tree) and nearest neighbor search (NNS) on massively parallel architecture (MPA) of graphics processing unit (GPU). Unlike previou...
详细信息
ISBN:
(纸本)9781479983926
This paper presents parallel algorithms for the construction of k dimensional tree (KD-tree) and nearest neighbor search (NNS) on massively parallel architecture (MPA) of graphics processing unit (GPU). Unlike previous parallel algorithms for KD-tree, for the first time, our parallel algorithms integrate high dimensional KD-tree construction and NNS on an MPA platform. The proposed massively parallel algorithms are of comparable quality as traditional sequential counterparts on CPU, while achieve high speedup performance on both low and high dimensional KD-tree. Low dimensional KD-tree construction and NNS algorithms, presented in this paper, outperform their serial CPU counterparts by a factor of up to 24 and 218, respectively. For high dimensional KD-tree, the speedup factors are even higher, raising to 30 and 242, respectively. Our implementations will potentially benefit real time three-dimensional (3D) image registration and high dimensional descriptor matching.
We present two parallel algorithms for finding all the roots of an N-degree polynomial equation on an efficient model of Optoelectronic Transpose Interconnection System (OTIS), called OTIS-2D torus. The parallel algor...
详细信息
We present two parallel algorithms for finding all the roots of an N-degree polynomial equation on an efficient model of Optoelectronic Transpose Interconnection System (OTIS), called OTIS-2D torus. The parallel algorithms are based on the iterative schemes of Durand-Kerner and Ehrlich methods. We show that the algorithm for the Durand-Kerner method requires (N (0.75)+0.5N (0.25)-1) electronic moves + 2(N (0.5)-1) OTIS moves using N processors. The parallel algorithm for Ehrlich method is shown to run in (N (0.75)+0.5N (0.25)-1) electronic moves + 2(N (0.5)-1) OTIS moves with the same number of processors. The algorithms have lower AT cost than the algorithms presented in Jana (parallel Comput 32:301-312, 2006). The scalability of the algorithms is also discussed.
Quadratic programming problems (QPs) that arise from dynamic optimization problems typically exhibit a very particular structure. We address the ubiquitous case where these QPs are strictly convex and propose a dual N...
详细信息
Quadratic programming problems (QPs) that arise from dynamic optimization problems typically exhibit a very particular structure. We address the ubiquitous case where these QPs are strictly convex and propose a dual Newton strategy that exploits the block-bandedness similarly to an interior-point method. Still, the proposed method features warmstarting capabilities of active-set methods. We give details for an efficient implementation, including tailored numerical linear algebra, step size computation, parallelization, and infeasibility handling. We prove convergence of the algorithm for the considered problem class. A numerical study based on the open-source implementation qpDUNES shows that the algorithm outperforms both well-established general purpose QP solvers as well as state-of-the-art tailored control QP solvers significantly on the considered benchmark problems.
The objective of this paper is to develop a robust maximum likelihood estimation (MLE) for the stochastic state space model via the expectation maximisation algorithm to cope with observation outliers. Two types of ou...
详细信息
The objective of this paper is to develop a robust maximum likelihood estimation (MLE) for the stochastic state space model via the expectation maximisation algorithm to cope with observation outliers. Two types of outliers and their influence are studied in this paper: namely,the additive outlier (AO) and innovative outlier (IO). Due to the sensitivity of the MLE to AO and IO, we propose two techniques for robustifying the MLE: the weighted maximum likelihood estimation (WMLE) and the trimmed maximum likelihood estimation (TMLE). The WMLE is easy to implement with weights estimated from the data;however, it is still sensitive to IO and a patch of AO outliers. On the other hand, the TMLE is reduced to a combinatorial optimisation problem and hard to implement but it is efficient to both types of outliers presented here. To overcome the difficulty, we apply the parallel randomised algorithm that has a low computational cost. A Monte Carlo simulation result shows the efficiency of the proposed algorithms.
We present an algorithm to solve the subset-sum problem (SSP) of capacity c and n items with weights w(i),1 <= i <= n, spending O(n(m - w(min))/p) time and O(n + m - w(min)) space in the Concurrent Read/Concurre...
详细信息
We present an algorithm to solve the subset-sum problem (SSP) of capacity c and n items with weights w(i),1 <= i <= n, spending O(n(m - w(min))/p) time and O(n + m - w(min)) space in the Concurrent Read/Concurrent Write (CRCW) PRAM model with 1 <= p <= m - w(min) processors, where w(min) is the lowest weight and m=min {c,Sigma(n)(i) (=)1 w(i)-c}, improving both upper-bounds. Thus, when n <= c, it is possible to solve the SSP in O(n) time in parallel environments with low memory. We also show OpenMP and CUDA implementations of this algorithm and, on Graphics Processing Unit, we obtained better performance than the best sequential and parallel algorithms known so far. Copyright (c) 2015 John Wiley & Sons, Ltd.
We introduce a new strategy for coupling the parallel in time (parareal) iterative methodology with multiscale integrators. Following the parareal framework, the algorithm computes a low-cost approximation of all slow...
详细信息
We introduce a new strategy for coupling the parallel in time (parareal) iterative methodology with multiscale integrators. Following the parareal framework, the algorithm computes a low-cost approximation of all slow variables in the system using an appropriate multiscale integrator, which is refined using parallel fine scale integrations. Convergence is obtained using an alignment algorithm for fast phase-like variables. The method may be used either to enhance the accuracy and range of applicability of the multiscale method in approximating only the slow variables, or to resolve all the state variables. The numerical scheme does not require that the system is split into slow and fast coordinates. Moreover, the dynamics may involve hidden slow variables, for example, due to resonances. We propose an alignment algorithm for almost-periodic solutions, in which case convergence of the parareal iterations is proved. The applicability of the method is demonstrated in numerical examples.
This paper proposes an efficient method for the computation of Galois field (GF) expressions for multiple-valued logic functions. The algorithm is based on the partitioning of the input function vector and uses both C...
详细信息
This paper proposes an efficient method for the computation of Galois field (GF) expressions for multiple-valued logic functions. The algorithm is based on the partitioning of the input function vector and uses both CPUs (central processing units) and GPUs (graphics processing units) for performing the computations in parallel. After the first step of the fast Fourier transform (FFT)-like algorithm is performed on the CPU, the function vector is divided into disjoint subvectors that are further processed in parallel on the CPU and GPU. The proposed computational method reduces the time needed for computing the coefficients in the GF-expressions and, in this way, might extend the possibilities for their practical application. The experimental comparison of the proposed solution and previously used methods for computing GF-expressions for ternary and quaternary functions, confirms the validity of the method.
We describe a half-approximation algorithm, b-SUITOR, for computing a b-MATCHING of maximum weight in a graph with weights on the edges. b-MATCHING is a generalization of the well-known MATCHING problem in graphs, whe...
详细信息
We describe a half-approximation algorithm, b-SUITOR, for computing a b-MATCHING of maximum weight in a graph with weights on the edges. b-MATCHING is a generalization of the well-known MATCHING problem in graphs, where the objective is to choose a subset of M edges in the graph such that at most a specified number b(v) of edges in M are incident on each vertex v. Subject to this restriction we maximize the sum of the weights of the edges in M. We prove that the b-SUITOR algorithm computes the same b-MATCHING as the one obtained by the GREEDY algorithm for the problem. We implement the algorithm on serial and shared-memory parallel processors and compare its performance against a collection of approximation algorithms that have been proposed earlier. Our results show that the b-SUITOR algorithm outperforms the GREEDY and locally dominant edge algorithms by one to two orders of magnitude on a serial processor. The b-SUITOR algorithm has a high degree of concurrency, and it scales well up to 240 threads on a shared-memory multiprocessor. The b-SUITOR algorithm outperforms the locally dominant edge algorithm by a factor of 14 on 16 cores of an Intel Xeon multiprocessor.
In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed va...
详细信息
In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers.
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and ef...
详细信息
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and effectively for this purpose, we highlight those related to clustering, more specifically, density-based clustering strategies, which stands out for its ability to define clusters of arbitrary shape and the robustness to deal with the presence of data noise, such as DBSCAN and OPTICS. However, these algorithms are still a computational challenge since they are distance-based proposals. In this work we present a new approach to make OPTICS feasible based on data indexing strategy. Although the simplicity with which the data are indexed, using graphs, it allows explore various parallelization opportunities, which were explored using graphic processing unit (GPU). Based on this structure, the complexity of OPTICS is reduced to O ( E *logV ) in the worst case, becoming itself very fast. In our evaluation we show that our proposal can be over 200 x faster than its sequential version using CPU.
暂无评论