Asynchronous iterative algorithms can reduce much of the data dependencies associated with synchronization barriers. The reported study investigates the potentials of asynchronous iterative algorithms by quantifying t...
详细信息
Asynchronous iterative algorithms can reduce much of the data dependencies associated with synchronization barriers. The reported study investigates the potentials of asynchronous iterative algorithms by quantifying the critical parallel processing factors. Specifically, a time complexity-based analysis method is used to understand the inherent interdependencies between computing and communication overheads for the parallel asynchronous algorithm. The results show, not only that the computational experiments closely match the analytical results, but also that the use of asynchronous iterative algorithms can be beneficial for a vast number of parallel processing environments. The choice of local stopping criteria that is critically important to the overall system performance is investigated in depth. (C) 1999 Academic Press.
parallel preconditioners are considered for improving the convergence rate of the conjugate gradient method for solving sparse symmetric positive definite systems generated by finite element models of subsurface flow ...
详细信息
parallel preconditioners are considered for improving the convergence rate of the conjugate gradient method for solving sparse symmetric positive definite systems generated by finite element models of subsurface flow The difficulties of adapting effective sequential preconditioners to the parallel environment are illustrated by our treatment of incomplete Cholesky preconditioning. These difficulties are avoided with multigrid preconditioning, which can be extended naturally to many processors so that the preconditioner remains global and effective. The coarse grid correction which defines the multigrid preconditioner is outlined and its parallel implementation with the distributed finite element data structure is presented, along with some examples of its use as a parallel preconditioner. (C) 1998 Academic Press.
A new general approach for numerically computing flow fields on parallel computing environments is presented, discussed and analysed. The hierarchy presented here is based on a parallel split of operators. A portion o...
详细信息
A new general approach for numerically computing flow fields on parallel computing environments is presented, discussed and analysed. The hierarchy presented here is based on a parallel split of operators. A portion of the theory is presented together with its application to two- and three-dimensional flows. This strategy is applied to a two-dimensional problem for which a specific parallel split, called a parabolized split, is given. The parallel algorithm that results from this split is analysed, leading to reasonably good parallel efficiency, which is close to 50%. Actual experiments lead to similar conclusions. This parallel strategy can also be used together with other parallel computing algorithms, such as domain decomposition, to give an optimal-type parallel algorithm for the Navier–Stokes equations.
This paper deals with a new class of parallel asynchronous iterative algorithms for the solution of nonlinear systems of equations, The main feature of the new class of methods presented here is the possibility of fle...
详细信息
This paper deals with a new class of parallel asynchronous iterative algorithms for the solution of nonlinear systems of equations, The main feature of the new class of methods presented here is the possibility of flexible communication between processors. In particular partial updates can be exchanged. Approximation of the associated fixed point mapping is also considered. A detailed convergence study is presented, A connection with the Schwarz alternating method is made for the solution of nonlinear boundary value problems. Computational results on a shared memory multiprocessor IBM 3090 are briefly presented.
We consider multimessage multicasting over the n processor complete (or fully connected) static network (MM,). First we present a linear time algorithm that constructs for every degree d problem instance a communicati...
详细信息
We consider multimessage multicasting over the n processor complete (or fully connected) static network (MM,). First we present a linear time algorithm that constructs for every degree d problem instance a communication schedule with total communication time at most d(2), where d is the maximum number of messages that each processor may send or receive. Then we present degree d problem instances such that all their communication schedules have total communication time at least d(2). We observe that our lower bound applies when the fan-out (maximum number of processors receiving any given message) is huge, and thus the number of processors is also huge. Since this environment is not likely to arise in the near future, we turn our attention to the study of important subproblems that are likely to arise in practice. We show that when each message has fan-out k = 1 the MM, problem corresponds to the makespan openshop preemptive scheduling problem which can be solved in polynomial time and show that for k greater than or equal to 2 our problem is NP-complete and remains NP-complete ev en when forwarding is allowed. We present an algorithm to generate a communication schedule with total communication time 2d-1 for any degree d problem instance with fan-out k = 2. Our main result is an O(q.d.e) time algorithm, where e less than or equal to nd (the input length), with an approximation bound of qd + k(1/q)(d-1), for any integer q such that k > q greater than or equal to 2. Our algorithms are centralized and require all the communication information ahead of time. Applications where all of this information is readily available include iterative algorithms for solving linear equations, and most dynamic programming procedures. The Meiko CS-2 machine and computer systems with processors communicating via dynamic permutation networks whose basic switches can act as data replicators (e.g., n by n Benes network with 2 by 2 switches that can also act as data replicators) will also b
Normalized explicit approximate inverse matrix techniques for computing explicitly various families of normalized approximate inverses based on normalized approximate factorization procedures for solving sparse linear...
详细信息
Normalized explicit approximate inverse matrix techniques for computing explicitly various families of normalized approximate inverses based on normalized approximate factorization procedures for solving sparse linear systems, which are derived from the finite difference and finite element discretization of partial differential equations are presented. Normalized explicit preconditioned conjugate gradient-type schemes in conjunction with normalized approximate inverse matrix techniques are presented for the efficient solution of linear and non-linear systems. Theoretical estimates on the rate of convergence and computational complexity of the normalized explicit preconditioned conjugate gradient method are also presented. Applications of the proposed methods on characteristic linear and non-linear problems are discussed and numerical results are given. (C) 2004 Elsevier Ltd. All rights reserved.
In the recent paper of Bai and Su a class of parallel decomposition-type accelerated overrelaxation (PDAOR) methods suitable to the SIMD-systems is established and convergence conditions are concluded when the coeffic...
详细信息
In the recent paper of Bai and Su a class of parallel decomposition-type accelerated overrelaxation (PDAOR) methods suitable to the SIMD-systems is established and convergence conditions are concluded when the coefficient matrices of the linear systems are L-matrices, H-matrices dr symmetric positive-definite matrices. In the case of H-matrices we improve the convergence area fbr relaxation parameters. (C) 2000 Elsevier Science Inc. All rights reserved.
A paralleliterative Galerkin method based on domain decomposition technique with nonconforming quadrilateral finite elements will be analyzed for second-order elliptic equations subject to the Robin boundary conditio...
详细信息
A paralleliterative Galerkin method based on domain decomposition technique with nonconforming quadrilateral finite elements will be analyzed for second-order elliptic equations subject to the Robin boundary condition, Optimal order error estimates are derived with respect to a broken H-1-norm and L-2-norm. Applications to time-dependent problems skill be considered. Some numerical experiments supporting the theoretical results will be given. This paper is to extend the work in [J. Douglas Jr., J.E. Santos. D. Sheen. X. Ye. Nonconforming Galerkin methods based on quadrilateral elements for second order elliptic problems. Mathematical Modelling and Numerical Analysis, RAIRO, Model. Math. Anal, Numer. 33 (4) (1999) 747] to the non-self-adjoint case of second-order equations including the term b . delu. We suppose that uniformly ellipticity holds, Hence the arguments in (loc. cit.) may be applied, word for word. So some proofs will be omitted. (C) 2002 Elsevier Science Inc. All rights reserved.
A new class of inner-outer iterative procedures in conjunction with Picard-Newton methods based on explicit preconditioning iterativemethods for solving nonlinear systems is presented. Explicit preconditioned iterati...
详细信息
A new class of inner-outer iterative procedures in conjunction with Picard-Newton methods based on explicit preconditioning iterativemethods for solving nonlinear systems is presented. Explicit preconditioned iterative schemes, based on the explicit computation of a class of domain decomposition generalized approximate inverse matrix techniques are presented for the efficient solution of nonlinear boundary value problems on multiprocessor systems. Applications of the new composite scheme on characteristic nonlinear boundary value problems are discussed and numerical results are given. (C) 2003 Elsevier Science Ltd. All rights reserved.
In this paper a regular bidirectional linear systolic array (RBLSA) for computing all-pairs shortest paths of a given directed graph is designed. The obtained array is optimal with respect to a number of processing el...
详细信息
In this paper a regular bidirectional linear systolic array (RBLSA) for computing all-pairs shortest paths of a given directed graph is designed. The obtained array is optimal with respect to a number of processing elements (PE) for a given problem size. The execution time of the array has been minimized. To obtain RBLSA with optimal number of PEs, the accommodation of the inner computation space of the systolic algorithm to the projection direction vector is performed. Finally, FPGA-based reprogrammable systems are revolutionizing certain types of computation and digital logic, since as logic emulation systems they offer some orders of magnitude speedup over software simulation;herein, a FPGA realization of the RBLSA is investigated and the performance evaluation results are discussed.
暂无评论