This paper presents some performance improvements introduced in ADINA-F for the ibm 3090 vector-multiprocessor, to run large model fluid-flow analyses. These improvements have been achieved by means of a new skyline s...
详细信息
This paper presents some performance improvements introduced in ADINA-F for the ibm 3090 vector-multiprocessor, to run large model fluid-flow analyses. These improvements have been achieved by means of a new skyline solver, the vectorization and parallelization of the assembly and the optimization of the I/O through the usage of data sets allocated on virtual storage instead of disks. The skyline solver is based on the blocked approach proposed by the level 3 BLAS (Basic Linear Algebra Subprograms). This solver achieves high performance due to vectorization, parallelization and optimized access of data stored in high speed memories. The maximum speed-up measured in this phase is about 25. The assembly has been vectorized and parallelized with a local speed-up of about 11. The optimization of the I/O appears a crucial point in order to achieve a very satisfactory overall reduction in the computational time. The use of the virtual input/output facility leads to a local speed-up of up to 18. The exploitation of some of the features of the ibm 3090 allows one to very efficiently run complex industrial problems. The time scale passes from several hours to a few minutes, the global job speed-up being up to 19.
Many engineering and scientific problems involve the solution of large sparse linear systems. To determine an optimal solving strategy for such systems, it is essential to understand the large- and small-scale propert...
详细信息
Many engineering and scientific problems involve the solution of large sparse linear systems. To determine an optimal solving strategy for such systems, it is essential to understand the large- and small-scale properties of the associated sparse matrices. We present a graphic tool to analyze the sparsity pattern and the numeric structure of these matrices. Through examples, drawn from our practical experience, we demonstrate the effectiveness and the interactive features of the tool. These features include zooming, scrolling in different directions, sorting of rows and/or columns, and selective plotting, according to the values of the matrix coefficients.
Codes based on implicit schemes for Euler and Navier-Stokes computations of transonic flows are very demanding in terms of computing time and storage, because the flowfield must be expressed at each time step as the s...
详细信息
Codes based on implicit schemes for Euler and Navier-Stokes computations of transonic flows are very demanding in terms of computing time and storage, because the flowfield must be expressed at each time step as the solution of a large system of nonlinear equations. The linearization of an implicit time-stepping scheme for solving the flow equations produces a system of linear equations whose coefficients matrix, hereafter referred to as the 'implicit-step operator,' has a sparse structure. In alternating direction implicit (ADI) schemes this matrix operator is approximately factored as the product of block-tridiagonal matrices, in order to maintain the computational costs of a direct numerical-solving procedure within acceptable limits. However, the ADI factorization error prevents the use of large time steps as a method for producing a fast elimination of the transient solution in steady-state computations, and it may be the main source of inaccuracy in the simulation of unsteady flows. This paper illustrates the use of the conjugate gradient squared (CGS) iterative algorithm for solving the implicit-step operator in unfactored form. In the present work the flow equations are linearized at each time step, following the Beam-Warming approach, and the ADI splitting of the implicit-step operator is used to build an efficient preconditioner of the unfactored system.
In view of the importance of CA (cellular automata) as prototypes of massively parallel architectures (for which locality is an essential ingredient), it is of interest to understand the impact of the nonlocal part of...
详细信息
In view of the importance of CA (cellular automata) as prototypes of massively parallel architectures (for which locality is an essential ingredient), it is of interest to understand the impact of the nonlocal part of the dynamics on the overall performance of the LFLG (Levy-flight lattice gas) algorithm. All computations presented pertain to the implementation on an ibm RISC-6000 model 530 workstation. This machine belongs to the family of superscalar architectures, in which the parallelism is implemented at the instruction level. Single workstation performance is presented.< >
A simple model is described, which can be used to estimate execution times of vectorized FORTRAN loops on the ibm 3090/VF. Although a number of simplifying assumptions are made in the model (especially concerning the ...
详细信息
A simple model is described, which can be used to estimate execution times of vectorized FORTRAN loops on the ibm 3090/VF. Although a number of simplifying assumptions are made in the model (especially concerning the modeling of the cache), the resulting estimates are roughly correct in many cases. In situations where this model is oversimplified or unrealistic, it can still be used as a starting point for more sophisticated performance predictions.
A pseudorandom number generator, based on a linear-feedback shift-register sequence, is presented. The very long period of the generator, 2/sup 1279/-1, makes it useful in modern statistical simulations. The proposed ...
详细信息
A pseudorandom number generator, based on a linear-feedback shift-register sequence, is presented. The very long period of the generator, 2/sup 1279/-1, makes it useful in modern statistical simulations. The proposed generator overcomes the limitations of multiplicative-congruential generators with modulus 2/sup 31/-1. The properties of linear-feedback shift-register sequences are reviewed, and a sequence of order p=1279 is proposed as a source of pseudorandom numbers. Results of the vectorization of shift register algorithm and of statistical tests are presented.< >
The determination of the permeability is an interesting problem of fluid dynamics of wide interdisciplinary concern. Many authors approached this subject by developing numerical models of flows through porous media at...
A pseudo-random number generator, based on a linear-feedback shift-register sequence, is presented. The very long period of the generator, 21279 — 1, makes it useful in modern statistical simulations where the shorte...
ISBN:
(纸本)9780897914123
A pseudo-random number generator, based on a linear-feedback shift-register sequence, is presented. The very long period of the generator, 21279 — 1, makes it useful in modern statistical simulations where the shorter period of other generators could either be exhausted during a single run or determine departures from uniform randomness when generating positions in a multidimensional space. In particular, the proposed generator overcomes the limitations of multiplicative-congruential generators with modulus 231 — 1, which are widely used on computers with a 32-bit integer word size, such as the ibm S/370 family of computers.
A new method is developed to solve 2-dimensional time dependent incompressible viscous flows in an arbitrary geometry. The Navier-Stokes equations are solved in primitive variables using a pseudospectral formulation. ...
A new method is developed to solve 2-dimensional time dependent incompressible viscous flows in an arbitrary geometry. The Navier-Stokes equations are solved in primitive variables using a pseudospectral formulation. The geometry is introduced by directly forcing the velocity field near the boundaries. The method extends a similar computational technique introduced by Basdevant and Sadourny [1]. Some examples are shown to evaluate its efficiency and accuracy.
This paper describes a set of optimized subroutines for use in solving sparse, symmetric, positive definite linear systems of equations using iterative algorithms. The set has been included in the engineering and Scie...
详细信息
This paper describes a set of optimized subroutines for use in solving sparse, symmetric, positive definite linear systems of equations using iterative algorithms. The set has been included in the engineering and scientific Subroutine Library (ESSL) for the ibm 3090 Vector Facility (VF). The subroutines are based on the conjugate-gradient method, preconditioned by the diagonal or by an incomplete factorization. They make use of storage representations of sparse matrices that are optimal for vector implementation. The ESSL vector subroutines are up to six times faster than a scalar implementation of the same algorithm.
暂无评论