We consider the spline collocation code COLSYS and its successor COLNEW for solving boundary value problems (BVPs) in ordinary differential equations (ODEs) paying particular attention to the cost of solving the resul...
详细信息
We consider the spline collocation code COLSYS and its successor COLNEW for solving boundary value problems (BVPs) in ordinary differential equations (ODEs) paying particular attention to the cost of solving the resulting almost block diagonal systems (ABDS) on scalar and vector computers. Our costings include analyses for extreme cases of large or high order systems of ODE's. These are designed to provide insight as to the asymptotic behaviour of the codes on vector processors. The paper closes with a discussion of parallelisation of the codes and of the conflicts between vectorisation and parallelisation.
The paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder reflections and Givens rotations. The c...
详细信息
The paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder reflections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n x n processor array. The Dirichlet problem is replaced by its finite-difference analog on an M x N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(MN log L/n(2)), where L = max(M + 1, N). A parallel proposal of QR factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously. The algorithms were coded in MPF and MPL parallel programming languages and results of computational experiments on the MasPar MP-1 system are also presented.
The dynamics of relativistic atomic wave functions evolving under the influence of intense laser pulses is used as an example of a general class of applications employing the alternating direction implicit method. The...
详细信息
The dynamics of relativistic atomic wave functions evolving under the influence of intense laser pulses is used as an example of a general class of applications employing the alternating direction implicit method. The method requires the solution of many tridiagonal systems of linear equations. A range of parallel algorithms for this setting are analyzed with respect to their scalability on large parallel machines. (C) 1999 Elsevier Science B.V. All rights reserved.
In this paper we describe a new parallel iterative technique to solve a set of linear equations. The technique can be applied to any serial iterative scheme and involves pipelining successive iterations. We give an ex...
详细信息
In this paper we describe a new parallel iterative technique to solve a set of linear equations. The technique can be applied to any serial iterative scheme and involves pipelining successive iterations. We give an example of this technique by modifying the classical successive overrelaxation method (SOR). The algorithm is implemented on a Sequent Symmetry multiprocessor machine and the experimental results are presented.
We discuss a parallel library of efficient algorithms for model reduction of large-scale systems with state-space dimension up to O(10(4)). We survey the numerical algorithms underlying the implementation of the chose...
详细信息
We discuss a parallel library of efficient algorithms for model reduction of large-scale systems with state-space dimension up to O(10(4)). We survey the numerical algorithms underlying the implementation of the chosen model reduction methods. The approach considered here is based on state-space truncation of the system matrices and includes absolute and relative error methods for both stable and unstable systems. In contrast to serial implementations of these methods, we employ Newton-type iterative algorithms for the solution of the major computational tasks. Experimental results report the numerical accuracy and the parallel performance of our approach on a cluster of Intel Pentium II processors. (C) 2003 Published by Elsevier B.V.
We consider here parallel variants of the Nyström and Fast Galerkin methods for the solution of Fredholm integral equations of the second kind. Numerical examples, and timings for an Ada implementation on a multi...
详细信息
We consider here parallel variants of the Nyström and Fast Galerkin methods for the solution of Fredholm integral equations of the second kind. Numerical examples, and timings for an Ada implementation on a multiprocessor Sequent Balance, are given for both smooth and non-smooth kernels.
In a recent publication (1992), the authors showed how efficient a new level 3 BLAS algorithm for almost block diagonal systems could be using just one processor of a CRAY Y-MP. Here they compare the corresponding res...
详细信息
In a recent publication (1992), the authors showed how efficient a new level 3 BLAS algorithm for almost block diagonal systems could be using just one processor of a CRAY Y-MP. Here they compare the corresponding results for up to eight processors using standard CRAY Library parallel implementations of the level 3 BLAS.
In this paper, we present a new architecture to build grids that can execute parallel programs based on legacy code. This architecture is layer based and software component performances are validated with benchmarks. ...
详细信息
In this paper, we present a new architecture to build grids that can execute parallel programs based on legacy code. This architecture is layer based and software component performances are validated with benchmarks. To illustrate the construction of a grid using the proposed architecture, we develop a case study that consists of a grid oriented to efficient execution of Java bytecode for which we validate and integrate legacy code of parallel linear algebra.
We present a tailored load balancing technique that addresses specific performance issues in the boundary data accumulation algorithm for non-overlapping domain decompositions. The technique is used to speed up a para...
详细信息
We present a tailored load balancing technique that addresses specific performance issues in the boundary data accumulation algorithm for non-overlapping domain decompositions. The technique is used to speed up a parallel conjugate gradient algorithm with an algebraic multigrid pre-conditioner to solve a potential problem on an unstructured tetrahedral finite element mesh. The optimized accumulation algorithm significantly improves the performance of the parallel solver and we show up to 50% runtime improvements over the standard approach in benchmark runs with up to 48 MPI processes. The load balancing problem itself is a global optimization problem that is solved approximately by local optimization algorithms in parallel that require no communication during the optimization process.
A class of preconditioning techniques for sparse matrices is considered, based on computing an approximation of the Schur complement of a (suitably ordered) matrix. The techniques generalize the reduced system methodo...
详细信息
A class of preconditioning techniques for sparse matrices is considered, based on computing an approximation of the Schur complement of a (suitably ordered) matrix. The techniques generalize the reduced system methodology for 2-cyclic matrices to non-2-cyclic matrices, and in addition, they are well suited to parallel architectures. Their effectiveness with numerical experiments on a nine-point finite-difference operator is demonstrated, and an analysis showing that they can be implemented efficiently on multiprocessors is presented.
暂无评论