In this study, we present a novel optimization model that can automatically and rapidly generate an optimally parallel preconditionedconjugategradient (PCG) algorithm for any given linear system on a specific multi-...
详细信息
In this study, we present a novel optimization model that can automatically and rapidly generate an optimally parallel preconditionedconjugategradient (PCG) algorithm for any given linear system on a specific multi-graphics processing unit (GPU) platform. For our proposed model, there are the following novelties: (1) a profile-based performance model for each one of the main components of the PCG algorithm, including the vector operation, inner product, and sparse matrix-vector multiplication (SpMV), is suggested, and (2) our model is general, independent of the problems, and only dependent on the resources of devices, and (3) our model is extensible. For a vector operation kernel, or inner product kernel, or SpMV kernel that is not included in our framework, once its performance model is successfully constructed, it can be incorporated into our framework. Our model is constructed only once for each type of GPU. The experiments validate the high efficiency of our proposed model. (C) 2017 Elsevier B.V. All rights reserved.
The Toeplitz matrix T-n with generating function f(omega) = |1 - e(-i omega)|(-2d)h(omega), where d is an element of (- (1)/(2), (1)/(2)) \ {0} and h(omega) is positive, continuous on [- pi, pi], and differentiable on...
详细信息
The Toeplitz matrix T-n with generating function f(omega) = |1 - e(-i omega)|(-2d)h(omega), where d is an element of (- (1)/(2), (1)/(2)) \ preconditioned and h(omega) is positive, continuous on [- pi, pi], and differentiable on [-pi, pi] \ preconditioned, has a Fisher - Hartwig singularity [ M. E. Fisher and R. E. Hartwig (1968), Adv. Chem. Phys., 32, pp. 190 - 225]. The complexity of the preconditionedconjugategradient (PCG) algorithm is known [R. H. Chan and M. Ng (1996), SIAM Rev., 38, pp. 427 - 482] to be O(nlogn) for Toeplitz systems when d = 0. However, the effect on the PCG algorithm of the Fisher - Hartwig singularity in Tn has not been explored in the literature. We show that the complexity of the conjugategradient (CG) algorithm for solving T(n)x = b without any preconditioning grows asymptotically as n(1+|d|) log(n). With T. Chan's optimal circulant preconditioner C-n [T. Chan (1988), SIAM J. Sci. Statist. Comput., 9, pp. 766 - 771], the complexity of the PCG algorithm is O(nlog(3)(n)).
It is gradually accepted that the loss of orthogonality of the gradients in a conjugategradientalgorithm may decelerate the convergence rate to some extent. The Dai-Kou conjugategradientalgorithm (SIAM J Optim 23(...
详细信息
It is gradually accepted that the loss of orthogonality of the gradients in a conjugategradientalgorithm may decelerate the convergence rate to some extent. The Dai-Kou conjugategradientalgorithm (SIAM J Optim 23(1):296-320, 2013), called CGOPT, has attracted many researchers' attentions due to its numerical efficiency. In this paper, we present an improved Dai-Kou conjugategradientalgorithm for unconstrained optimization, which only consists of two kinds of iterations. In the improved Dai-Kou conjugategradientalgorithm, we develop a new quasi-Newton method to improve the orthogonality by solving the subproblem in the subspace and design a modified strategy for the choice of the initial stepsize for improving the numerical performance. The global convergence of the improved Dai-Kou conjugategradientalgorithm is established without the strict assumptions in the convergence analysis of other limited memory conjugategradient methods. Some numerical results suggest that the improved Dai-Kou conjugategradientalgorithm (CGOPT (2.0)) yields a tremendous improvement over the original Dai-Kou CG algorithm (CGOPT (1.0)) and is slightly superior to the latest limited memory conjugategradient software package CG_DESCENT (6.8) developed by Hager and Zhang (SIAM J Optim 23(4):2150-2168, 2013) for the CUTEr library.
We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Fi...
详细信息
We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or 'mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our iterative solver with an FFT-based solver that is more commonly used for applications in beam dynamics. (C) 2010 Elsevier Inc. All rights reserved.
This article presents a parallel simulation solver for groundwater flow on CUDA. preconditionedconjugategradient (PCG) algorithm is used to solve the large linear systems arising from the finite-difference discretiz...
详细信息
This article presents a parallel simulation solver for groundwater flow on CUDA. preconditionedconjugategradient (PCG) algorithm is used to solve the large linear systems arising from the finite-difference discretization of three-dimensional groundwater flow problems. CUDA implementing methods for the two most time-consuming operations in PCG, sparse matrix-vector multiplication and vector inner-product, are given. The experimental results show that CUDA can speed up the solving process of the groundwater simulation significantly. 1.8-3.7 speedup can be achieved with different GPUs for a transient groundwater flow problem.
We study a 2-level multiplicative Schwarz method for the p version Galerkin boundary element method for a weakly singular integral equation of the first kind in 3D. We prove that the rate of convergence of the multipl...
详细信息
We study a 2-level multiplicative Schwarz method for the p version Galerkin boundary element method for a weakly singular integral equation of the first kind in 3D. We prove that the rate of convergence of the multiplicative Schwarz operator for the p version grows only logarithmically in p and is independent of h. (c) 2006 IMACS. Published by Elsevier B.V. All rights reserved.
Key messageA new genomic prediction method (RHPP) was developed via combining randomized Haseman-Elston regression (RHE-reg), PCR based on genomic information of core population, and preconditionedconjugategradient ...
详细信息
Key messageA new genomic prediction method (RHPP) was developed via combining randomized Haseman-Elston regression (RHE-reg), PCR based on genomic information of core population, and preconditionedconjugategradient (PCG) *** efficiency is becoming a hot issue in the practical application of genomic prediction due to the large number of data generated by the high-throughput genotyping technology. In this study, we developed a fast genomic prediction method RHPP via combining randomized Haseman-Elston regression (RHE-reg), PCR based on genomic information of core population, and preconditionedconjugategradient (PCG) algorithm. The simulation results demonstrated similar prediction accuracy between RHPP and GBLUP, and significantly higher computational efficiency of the former with the increase of individuals. The results of real datasets of both bread wheat and loblolly pine demonstrated that RHPP had a similar or better predictive accuracy in most cases compared with GBLUP. In the future, RHPP may be an attractive choice for analyzing large-scale and high-dimensional data.
We study a multilevel additive Schwarz method for the h-p version of the Galerkin boundary element method with geometrically graded meshes. Both hypersingular and weakly singular integral equations of the first kind a...
详细信息
We study a multilevel additive Schwarz method for the h-p version of the Galerkin boundary element method with geometrically graded meshes. Both hypersingular and weakly singular integral equations of the first kind are considered. As it is well known the h-p version with geometric meshes converges exponentially fast in the energy-norm. However, the condition number of the Galerkin matrix in this case blows up exponentially in the number of unknowns M. We prove that the condition number kappa(P) of the multilevel additive Schwarz operator behaves like O(root Mlog(2) M). Asa direct consequence of this we also give the results for the 2-level preconditioner and also for the h-p version with quasi-uniform meshes. Numerical results supporting our theory are presented.
A computationally efficient preconditioned conjugate gradient algorithm with a symmetric successive over-relaxation (SSOR) preconditioner for the iterative solution of set mixed model equations is described. The poten...
详细信息
A computationally efficient preconditioned conjugate gradient algorithm with a symmetric successive over-relaxation (SSOR) preconditioner for the iterative solution of set mixed model equations is described. The potential computational savings of this approach are examined for an example of single-step genomic evaluation of Australian sheep. Results show that the SSOR pre-conditioner can substantially reduce the number of iterates required for solutions to converge compared with simpler preconditioners with marked reductions in overall computing time.
We discuss the scalable parallel solution of the Poisson equation on irregularly shaped domains discretized by finite differences. The symmetric positive definite system is solved by the preconditionedconjugate gradi...
详细信息
ISBN:
(纸本)9783642281501;9783642281518
We discuss the scalable parallel solution of the Poisson equation on irregularly shaped domains discretized by finite differences. The symmetric positive definite system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. The improvements are due to a better data partitioning and the iterative solution of the coarsest level system in AMG. We demonstrate good scalability of the solver on a distributed memory parallel computer with up to 2048 processors.
暂无评论