The parallel alternating direction method of multipliers (ADMM) algorithm is widely recognized for its effectiveness in handling large-scale datasets stored in a distributed manner, making it a popular choice for solv...
详细信息
Sorting is a fundamental task in computing and plays a central role in information technology. The advent of rack-scale and warehouse-size data processing shaped the architecture of data analysis platforms towards sup...
详细信息
Sorting is a fundamental task in computing and plays a central role in information technology. The advent of rack-scale and warehouse-size data processing shaped the architecture of data analysis platforms towards supercomputing. In turn, established techniques on supercomputers have become relevant to a wider range of application domains. This work is concerned with multi-way mergesort with exact splitting on distributed memory architectures. At its core, our approach leverages a novel and parallel algorithm for multi-way selection problems. Remarkably concise, the algorithm relies on MPI_Allgather and MPI_ReduceScatter_block, two collective communication schemes that find hardware support in most high-end networks. A software implementation of our approach is used to process the Terabyte-size Data Challenge 2 signal, released by the SKA radio telescopes organization. On the supercomputer considered herein, our approach outperforms the state of the art by up to 2.6X using 9,216 cores. Our implementation is released as a compact open source library compliant to the MPI programming model. By supporting the most popular elementary key types, and arbitrary fixed-size value types, the library can be straightforwardly integrated into third-party MPI-based software
The parallel alternating direction method of multipliers (ADMM) algorithms have gained popularity in statistics and machine learning due to their efficient handling of large sample data problems. However, the parallel...
详细信息
In probabilistic state inference, we seek to estimate the state of an (autonomous) agent from noisy observations. It can be shown that, under certain assumptions, finding the estimate is equivalent to solving a linear...
详细信息
In probabilistic state inference, we seek to estimate the state of an (autonomous) agent from noisy observations. It can be shown that, under certain assumptions, finding the estimate is equivalent to solving a linear least squares problem. Solving such a problem is done by calculating the upper triangularmatrixRfrom the coefficient matrix A, using the QR or Cholesky factorizations;this matrix is commonly referred to as the "square root matrix". In sequential estimation problems, we are often interested in periodic optimization of the state variable order, e.g., to reduce fill-in, or to apply a predictive variable ordering tactic;however, changing the variable order implies expensive re-factorization of the system. Thus, we address the problem of modifying an existing square root matrix R, to convey reordering of the variables. To this end, we identify several conclusions regarding the effect of column permutation on the factorization, to allow efficient modification of R, without accessing A at all, or with minimal re-factorization. The proposed parallelizable algorithm achieves a significant improvement in performance over the state-of-the-art incremental Smoothing AndMapping (iSAM2) algorithm, which utilizes incremental factorization to update R.
Multinomial Logistic Regression is a well-studied tool for classification and has been widely used in fields like image processing, computer vision and, bioinformatics, to name a few. Under a supervised classification...
详细信息
Multinomial Logistic Regression is a well-studied tool for classification and has been widely used in fields like image processing, computer vision and, bioinformatics, to name a few. Under a supervised classification scenario, a Multinomial Logistic Regression model learns a weight vector to differentiate between any two classes by optimizing over the likelihood objective. With the advent of big data, the inundation of data has resulted in large dimensional weight vector and has also given rise to a huge number of classes, which makes the classical methods applicable for model estimation not computationally viable. To handle this issue, we here propose a parallel iterative algorithm: parallel Iterative Algorithm for MultiNomial LOgistic Regression ( PIANO ) which is based on the Majorization Minimization procedure, and can parallely update each element of the weight vectors. Further, we also show that PIANO can be easily extended to solve the Sparse Multinomial Logistic Regression problem -an extensively studied problem because of its attractive feature selection property. In particular, we work out the extension of PIANO to solve the Sparse Multinomial Logistic Regression problem with epsilon(1) and t 0 regularizations. We also prove that PIANO converges to a stationary point of the Multinomial and the Sparse Multinomial Logistic Regression problems. Simulations were conducted to compare PIANO with the existing methods, and it was found that the proposed algorithm performs better than the existing methods in terms of speed of convergence.(C) 2022 Elsevier B.V. All rights reserved.
This article presents algorithms for temporal parallelization of Bayesian smoothers. We define the elements and the operators to pose these problems as the solutions to all-prefix-sums operations for which efficient p...
详细信息
This article presents algorithms for temporal parallelization of Bayesian smoothers. We define the elements and the operators to pose these problems as the solutions to all-prefix-sums operations for which efficient parallel scan-algorithms are available. We present the temporal parallelization of the general Bayesian filtering and smoothing equations, and specialize them to linear/Gaussian models. The advantage of the proposed algorithms is that they reduce the linear complexity of standard smoothing algorithms with respect to time to logarithmic.
Computer-Generated Holography (CGH) algorithms simulate numerical diffraction, being applied in particular for holographic display technology. Due to the wave-based nature of diffraction, CGH is highly computationally...
详细信息
Computer-Generated Holography (CGH) algorithms simulate numerical diffraction, being applied in particular for holographic display technology. Due to the wave-based nature of diffraction, CGH is highly computationally intensive, making it especially challenging for driving high-resolution displays in real-time. To this end, we propose a technique for efficiently calculating holograms of 3D line segments. We express the solutions analytically and devise an efficiently computable approximation suitable for massively parallel computing architectures. The algorithms are implemented on a GPU (with CUDA), and we obtain a 70-fold speedup over the reference point-wise algorithm with almost imperceptible quality loss. We report real-time frame rates for CGH of complex 3D line-drawn objects, and validate the algorithm in both a simulation environment as well as on a holographic display setup.
In this paper, we propose and analyze the parallel Robin-Robin domain decomposition method based on the modified characteristic finite element method for the time-dependent dual-porosity-Navier-Stokes model with the B...
详细信息
In this paper, we propose and analyze the parallel Robin-Robin domain decomposition method based on the modified characteristic finite element method for the time-dependent dual-porosity-Navier-Stokes model with the Beavers-Joseph interface condition. For the coupling terms, we treat them in an explicit manner which takes advantage of information obtained in previous time steps to construct a non-iteration domain decomposition method. By this means, two single dual-porosity equations and a single Navier-Stokes equation are needed to solve at each time. In particular, we solve the Navier-Stokes equation by the modified characteristic finite element method, which avoids the computational inefficiency caused by the nonlinear convection term. Furthermore, we prove the error convergence of solutions by mathematical induction, whose proof implies the uniform L-infinity-boundedness of the fully discrete velocity solution in conduit flow. Finally, some numerical examples are presented to show the effectiveness and efficiency of the proposed method.
This paper proposes a synchronous parallel block coordinate descent algorithm for minimizing a composite function,which consists of a smooth convex function plus a non-smooth but separable convex *** to the generaliza...
详细信息
This paper proposes a synchronous parallel block coordinate descent algorithm for minimizing a composite function,which consists of a smooth convex function plus a non-smooth but separable convex *** to the generalization of the proposed method,some existing synchronous parallel algorithms can be considered as special *** tackle high dimensional problems,the authors further develop a randomized variant,which randomly update some blocks of coordinates at each round of *** proposed parallel algorithms are proven to have sub-linear convergence rate under rather mild *** numerical experiments on solving the large scale regularized logistic regression with 1 norm penalty show that the implementation is quite *** authors conclude with explanation on the observed experimental results and discussion on the potential improvements.
Delaunay Triangulation(DT) is one of the important geometric problems that is used in various branches of knowledge such as computer vision, terrain modeling, spatial clustering and networking. Kinetic data structures...
详细信息
暂无评论