Two commonly used preconditioners were evaluated for parallel solution of linear systems of equations with high condition numbers. The test cases were derived from topology optimisation applications in multiple discip...
详细信息
Two commonly used preconditioners were evaluated for parallel solution of linear systems of equations with high condition numbers. The test cases were derived from topology optimisation applications in multiple disciplines, where the material distribution finite element methods were used. Because in this optimisation method, the equations rapidly become ill-conditioned due to disappearance of large number of elements from the design space as the optimisations progresses, it is shown that the choice for a suitable preconditioner becomes very crucial. In an earlier work the conjugate gradient (CG) method with a Block-Jacobi preconditioner was used, in which the number of CG iterations increased rapidly with the increasing number processors. Consequently, the parallel scalability of the method deteriorated fast due to the increasing loss of interprocessor information among the increased number of processors. By replacing the Block-Jacobi preconditioner with a sparse approximate inverse preconditioner, it is shown that the number of iterations to converge became independent of the number of processors. Therefore, the parallel scalability is improved.
By using local Fourier analysis, a simultaneous directions parallel method, which is a particular instance of the parallel fractional step algorithm, is shown to possess smoothing effects when applied to Poisson probl...
详细信息
By using local Fourier analysis, a simultaneous directions parallel method, which is a particular instance of the parallel fractional step algorithm, is shown to possess smoothing effects when applied to Poisson problems. The specific smoothing factor is determined and the expected factor values are found to be consistent with those obtained. The simultaneous directions approach is an advantageous alternative to other existing smoothers in the multigrid environment. (c) 2005 Wiley Periodicals, Inc.
In this paper we present a new algorithm for the solution of Hamilton-Jacobi-Bellman equations related to optimal control problems. The key idea is to divide the domain of computation into subdomains which are shaped ...
详细信息
In this paper we present a new algorithm for the solution of Hamilton-Jacobi-Bellman equations related to optimal control problems. The key idea is to divide the domain of computation into subdomains which are shaped by the optimal dynamics of the underlying control problem. This can result in a rather complex geometrical subdivision, but it has the advantage that every subdomain is invariant with respect to the optimal dynamics, and then the solution can be computed independently in each subdomain. The features of this dynamics-dependent domain decomposition can be exploited to speed up the computation and for an efficient parallelization, since the classical transmission conditions at the boundaries of the subdomains can be avoided. For their properties, the subdomains are patches in the sense introduced by Ancona and Bressan [ESAIM Control Optim. Calc. Var., 4 (1999), pp. 445-471]. Several examples in two and three dimensions illustrate the properties of the new method.
We are interested in parallelizing the least angle regression (LARS) algorithm for fitting linear regression models to high-dimensional data. We consider two parallel and communication avoiding versions of the basic L...
详细信息
We are interested in parallelizing the least angle regression (LARS) algorithm for fitting linear regression models to high-dimensional data. We consider two parallel and communication avoiding versions of the basic LARS algorithm. The two algorithms have different asymptotic costs and practical performance. One offers more speedup and the other produces more accurate output. The first is bLARS, a block version of the LARS algorithm, where we update b columns at each iteration. Assuming that the data are row-partitioned, bLARS reduces the number of arithmetic operations, latency, and bandwidth by a factor of b. The second is tournament-bLARS (T-bLARS), a tournament version of LARS where processors compete by running several LARS computations in parallel to choose b new columns to be added in the solution. Assuming that the data are column-partitioned, T-bLARS reduces latency by a factor of b. Similarly to LARS, our proposed methods generate a sequence of linear models. We present extensive numerical experiments that illustrate speedups up to 4x compared to LARS without any compromise in solution quality.
In this paper, a new block method of the second order is presented to solve initial value problems numerically. This method is similar to the block trapezoidal rule [S. Abbas and L. M. Delves, parallel solution of ODE...
详细信息
In this paper, a new block method of the second order is presented to solve initial value problems numerically. This method is similar to the block trapezoidal rule [S. Abbas and L. M. Delves, parallel solution of ODE's by one step block methods, Report CSMR, University of Liverpool, 1989.], where the low power of the block size appears in the principal local truncation error. Direct comparison with the related results of the block trapezoidal rule has been outlined.
New block methods of order two and three for the numerical solution of initial value problems are derived. The matrix coefficients of these methods are chosen such that low powers of the blocksize appear in the princi...
详细信息
New block methods of order two and three for the numerical solution of initial value problems are derived. The matrix coefficients of these methods are chosen such that low powers of the blocksize appear in the principal local truncation errors.
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how t...
详细信息
We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony impacts the convergence rates of the iterates. Our results shorten, streamline and strengthen existing convergence proofs for several asynchronous optimization methods and allow us to establish convergence guarantees for popular algorithms that were thus far lacking a complete theoretical under-standing. Specifically, we use our results to derive better iteration complexity bounds for proximal incremental aggregated gradient methods, to obtain tighter guarantees depending on the average rather than maximum delay for the asynchronous stochastic gradient descent method, to provide less conservative analyses of the speedup conditions for asynchronous block-co ordinate implementations of Krasnosel'skii-Mann iterations, and to quantify the convergence rates for totally asynchronous iterations under various assumptions on communication delays and update rates.
Particle smoothers are SMC (Sequential Monte Carlo) algorithms designed to approximate the joint distribution of the states given observations from a state-space model. We propose dSMC (de-Sequentialized Monte Carlo),...
详细信息
Particle smoothers are SMC (Sequential Monte Carlo) algorithms designed to approximate the joint distribution of the states given observations from a state-space model. We propose dSMC (de-Sequentialized Monte Carlo), a new particle smoother that is able to process T observations in O(log2 T) time on parallel architectures. This compares favorably with standard particle smoothers, the complexity of which is linear in T. We derive GP convergence results for dSMC, with an explicit upper bound, polynomial in T. We then discuss how to reduce the variance of the smoothing estimates computed by dSMC by (i) designing good proposal distributions for sampling the particles at the initialization of the algorithm, as well as by (ii) using lazy resampling to increase the number of particles used in dSMC. Finally, we design a particle Gibbs sampler based on dSMC, which is able to perform parameter inference in a state-space model at a O(log2 T) cost on parallel hardware.
The material point method (MPM) is an extension of particle-in-cell method to solid mechanics. A parallel MPM code is developed using FORTRAN 95 and OpenMP in this study, which is designed primarily for solving impact...
详细信息
The material point method (MPM) is an extension of particle-in-cell method to solid mechanics. A parallel MPM code is developed using FORTRAN 95 and OpenMP in this study, which is designed primarily for solving impact dynamic problems. Two parallel methods, the array expansion method and the domain decomposition method, are presented to avoid data races ill the nodal update stage. In the array expansion method, two-dimensional auxiliary arrays are created for nodal variables. After updating grid nodes in all threads, the auxiliary arrays are assembled to establish the global nodal array. In the domain decomposition method, the background grid is decomposed into some uniform patches, and each thread deals with a patch. The information of neighbor patches is exchanged through shared variables. After updating nodes in all patches, their nodal variables are assembled to establish the global nodal variables. The numerical tests show that the domain decomposition method has much better parallel scalability and higher parallel efficiency than the array expansion method. Therefore, a parallel computer code, MPM3DMP, is developed based oil the domain decomposition method. Finally, MPM3DMP is applied to a large-scale simulation with 13,542,030 particles for obtaining the high-resolution results of debris cloud in hypervelocity impact.
parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which...
详细信息
parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. (C) 2011 Wiley Periodicals, Inc. J Comput Chem 32: 3005-3013, 2011
暂无评论