The objective of this paper is to present an efficient parallel implementation of the iterative compact high-order approximation numerical solver for 3D Helmholtz equation on multicore computers. The high-order parall...
详细信息
ISBN:
(纸本)9781510651098;9781510651081
The objective of this paper is to present an efficient parallel implementation of the iterative compact high-order approximation numerical solver for 3D Helmholtz equation on multicore computers. The high-order parallel iterative algorithm is built upon a combination of a Krylov subspace-type method with a direct parallel Fast Fourier transform (FFT) type preconditioner from the authors' previous work, as shown in Ref. 7. In this paper, we will be presenting the result of our algorithm by computationally simulating data with realistic ranges of parameters in soil and mine-like targets. Our algorithm will also be incorporating second, fourth, and sixth-order compact finite difference schemes. The accuracy and result of the fourth and sixth-order compact approximation will be shown alongside the scalability of our implementation in the parallel programming environment.
The parallel alternating direction method of multipliers (ADMM) algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solv...
详细信息
Sorting is a fundamental task in computing and plays a central role in information technology. The advent of rack-scale and warehouse-size data processing shaped the architecture of data analysis platforms towards sup...
详细信息
Sorting is a fundamental task in computing and plays a central role in information technology. The advent of rack-scale and warehouse-size data processing shaped the architecture of data analysis platforms towards supercomputing. In turn, established techniques on supercomputers have become relevant to a wider range of application domains. This work is concerned with multi-way mergesort with exact splitting on distributed memory architectures. At its core, our approach leverages a novel and parallel algorithm for multi-way selection problems. Remarkably concise, the algorithm relies on MPI_Allgather and MPI_ReduceScatter_block, two collective communication schemes that find hardware support in most high-end networks. A software implementation of our approach is used to process the Terabyte-size Data Challenge 2 signal, released by the SKA radio telescopes organization. On the supercomputer considered herein, our approach outperforms the state of the art by up to 2.6X using 9,216 cores. Our implementation is released as a compact open source library compliant to the MPI programming model. By supporting the most popular elementary key types, and arbitrary fixed-size value types, the library can be straightforwardly integrated into third-party MPI-based software
The parallel alternating direction method of multipliers (ADMM) algorithms have gained popularity in statistics and machine learning due to their efficient handling of large sample data problems. However, the parallel...
详细信息
In probabilistic state inference, we seek to estimate the state of an (autonomous) agent from noisy observations. It can be shown that, under certain assumptions, finding the estimate is equivalent to solving a linear...
详细信息
In probabilistic state inference, we seek to estimate the state of an (autonomous) agent from noisy observations. It can be shown that, under certain assumptions, finding the estimate is equivalent to solving a linear least squares problem. Solving such a problem is done by calculating the upper triangularmatrixRfrom the coefficient matrix A, using the QR or Cholesky factorizations;this matrix is commonly referred to as the "square root matrix". In sequential estimation problems, we are often interested in periodic optimization of the state variable order, e.g., to reduce fill-in, or to apply a predictive variable ordering tactic;however, changing the variable order implies expensive re-factorization of the system. Thus, we address the problem of modifying an existing square root matrix R, to convey reordering of the variables. To this end, we identify several conclusions regarding the effect of column permutation on the factorization, to allow efficient modification of R, without accessing A at all, or with minimal re-factorization. The proposed parallelizable algorithm achieves a significant improvement in performance over the state-of-the-art incremental Smoothing AndMapping (iSAM2) algorithm, which utilizes incremental factorization to update R.
This article presents algorithms for temporal parallelization of Bayesian smoothers. We define the elements and the operators to pose these problems as the solutions to all-prefix-sums operations for which efficient p...
详细信息
This article presents algorithms for temporal parallelization of Bayesian smoothers. We define the elements and the operators to pose these problems as the solutions to all-prefix-sums operations for which efficient parallel scan-algorithms are available. We present the temporal parallelization of the general Bayesian filtering and smoothing equations, and specialize them to linear/Gaussian models. The advantage of the proposed algorithms is that they reduce the linear complexity of standard smoothing algorithms with respect to time to logarithmic.
In this paper, we study submodular maximization under a matroid constraint in the adaptive complexity model. This model was recently introduced in the context of submodular optimization to quantify the information the...
详细信息
In this paper, we study submodular maximization under a matroid constraint in the adaptive complexity model. This model was recently introduced in the context of submodular optimization to quantify the information theoretic complexity of black-box optimization in a parallel computation model. Despite the burst in work on submodular maximization in the adaptive complexity model, the fundamental problem of maximizing a monotone submodular function under a matroid constraint has remained elusive. In particular, all known techniques fail for this problem and there are no known constant factor approximation algorithms whose adaptivity is sublinear in the rank of the matroid k or in the worst case sublinear in the size of the ground set n. We present an algorithm that has an approximation guarantee arbitrarily close to the optimal 1 - 1/e for monotone submodular maximization under a matroid constraint and has near-optimal adaptivity of O(log (n) log (k)). This result is obtained using a novel technique of adaptive sequencing, which departs from previous techniques for submodular maximization in the adaptive complexity model. In addition to our main result, we show how to use this technique to design other approximation algorithms with strong approximation guarantees and polylogarithmic adaptivity.
Computer-Generated Holography (CGH) algorithms simulate numerical diffraction, being applied in particular for holographic display technology. Due to the wave-based nature of diffraction, CGH is highly computationally...
详细信息
Computer-Generated Holography (CGH) algorithms simulate numerical diffraction, being applied in particular for holographic display technology. Due to the wave-based nature of diffraction, CGH is highly computationally intensive, making it especially challenging for driving high-resolution displays in real-time. To this end, we propose a technique for efficiently calculating holograms of 3D line segments. We express the solutions analytically and devise an efficiently computable approximation suitable for massively parallel computing architectures. The algorithms are implemented on a GPU (with CUDA), and we obtain a 70-fold speedup over the reference point-wise algorithm with almost imperceptible quality loss. We report real-time frame rates for CGH of complex 3D line-drawn objects, and validate the algorithm in both a simulation environment as well as on a holographic display setup.
This paper proposes a synchronous parallel block coordinate descent algorithm for minimizing a composite function,which consists of a smooth convex function plus a non-smooth but separable convex *** to the generaliza...
详细信息
This paper proposes a synchronous parallel block coordinate descent algorithm for minimizing a composite function,which consists of a smooth convex function plus a non-smooth but separable convex *** to the generalization of the proposed method,some existing synchronous parallel algorithms can be considered as special *** tackle high dimensional problems,the authors further develop a randomized variant,which randomly update some blocks of coordinates at each round of *** proposed parallel algorithms are proven to have sub-linear convergence rate under rather mild *** numerical experiments on solving the large scale regularized logistic regression with 1 norm penalty show that the implementation is quite *** authors conclude with explanation on the observed experimental results and discussion on the potential improvements.
Delaunay Triangulation(DT) is one of the important geometric problems that is used in various branches of knowledge such as computer vision, terrain modeling, spatial clustering and networking. Kinetic data structures...
详细信息
暂无评论