The demand for very high computing performance has become increasingly common in many scientific and engineering environments. In addition to vector processing, parallel computing is now considered a useful way to enh...
详细信息
The demand for very high computing performance has become increasingly common in many scientific and engineering environments. In addition to vector processing, parallel computing is now considered a useful way to enhance performance. However, parallel computing tends to be unpopular among users because, with presently available technology and software, it requires explicit programmer intervention to exploit architectural parallelism. This intervention can be minor in some cases, but it often requires a nonnegligible amount of program restructuring, or even a reformulation of some of the algorithms used. In addition, it makes program debugging considerably more difficult. Tools for interprocedural program analysis, able to analyze at high levels large FORTRAN programs composed of many subroutines and to perform an automatic high-level parallel decomposition, are being developed at ibm. As an additional possibility, low-level parallelism could also be exploited automatically. This would greatly simplify the program analysis needed in a compiler in order to automatically insert parallel constructs in a program. However, this could only be efficient in an environment in which the cost for scheduling and synchronizing parallel activities is extremely small. Such a “microtasking” environment has been realized, at a prototype level, for ibm multiprocessors and is described in this paper. Its applicability and potential are illustrated with the help of some simple examples. These examples show that, with microtasking, good processor utilization and useful speed-ups are achieved even for fine-grain problems. The user interface of the prototype is described, both at the FORTRAN and at the Assembler level, and the possibility of incorporating this environment into a FORTRAN compiler is discussed.
Codes based on implicit schemes for Euler and Navier-Stokes computations of transonic flows are very demanding in terms of computing time and storage, because the flowfield must be expressed at each time step as the s...
详细信息
Codes based on implicit schemes for Euler and Navier-Stokes computations of transonic flows are very demanding in terms of computing time and storage, because the flowfield must be expressed at each time step as the solution of a large system of nonlinear equations. The linearization of an implicit time-stepping scheme for solving the flow equations produces a system of linear equations whose coefficients matrix, hereafter referred to as the 'implicit-step operator,' has a sparse structure. In alternating direction implicit (ADI) schemes this matrix operator is approximately factored as the product of block-tridiagonal matrices, in order to maintain the computational costs of a direct numerical-solving procedure within acceptable limits. However, the ADI factorization error prevents the use of large time steps as a method for producing a fast elimination of the transient solution in steady-state computations, and it may be the main source of inaccuracy in the simulation of unsteady flows. This paper illustrates the use of the conjugate gradient squared (CGS) iterative algorithm for solving the implicit-step operator in unfactored form. In the present work the flow equations are linearized at each time step, following the Beam-Warming approach, and the ADI splitting of the implicit-step operator is used to build an efficient preconditioner of the unfactored system.
This paper describes a set of optimized subroutines for use in solving sparse, symmetric, positive definite linear systems of equations using iterative algorithms. The set has been included in the engineering and Scie...
详细信息
This paper describes a set of optimized subroutines for use in solving sparse, symmetric, positive definite linear systems of equations using iterative algorithms. The set has been included in the engineering and scientific Subroutine Library (ESSL) for the ibm 3090 Vector Facility (VF). The subroutines are based on the conjugate-gradient method, preconditioned by the diagonal or by an incomplete factorization. They make use of storage representations of sparse matrices that are optimal for vector implementation. The ESSL vector subroutines are up to six times faster than a scalar implementation of the same algorithm.
A simple model is described, which can be used to estimate execution times of vectorized FORTRAN loops on the ibm 3090/VF. Although a number of simplifying assumptions are made in the model (especially concerning the ...
详细信息
A simple model is described, which can be used to estimate execution times of vectorized FORTRAN loops on the ibm 3090/VF. Although a number of simplifying assumptions are made in the model (especially concerning the modeling of the cache), the resulting estimates are roughly correct in many cases. In situations where this model is oversimplified or unrealistic, it can still be used as a starting point for more sophisticated performance predictions.
One of the most challenging problems in fluid dynamics is understanding the properties of turbulent flows. The advent of large supercomputers permits the investigation of turbulence with great accuracy in two dimensio...
详细信息
One of the most challenging problems in fluid dynamics is understanding the properties of turbulent flows. The advent of large supercomputers permits the investigation of turbulence with great accuracy in two dimensions, but full three-dimensional problems are physically more complex and their study is currently limited to the case of simple flows. It is shown that the availability of a continuous time-dependent representation of the dynamics of fluid flows can quickly lead to more complete understanding of the many concurrent physical mechanisms ruling turbulence. Some significant examples show how an analog videotape, obtained from direct computer simulations of fluid flows, suggests physical results that can later be obtained through a mathematical analysis of the numerical simulations.
By high-resolution numerical integration of two-dimensional Navier-Stokes equations we show that the turbulent flow at high Reynolds number is dominated by a simple and weakly unstable Hamiltonian system of pointlike ...
详细信息
By high-resolution numerical integration of two-dimensional Navier-Stokes equations we show that the turbulent flow at high Reynolds number is dominated by a simple and weakly unstable Hamiltonian system of pointlike vortices. The large instabilities, typical of the turbulent flow, are found uniquely outside vortices, in the wide dissipative region which results to be only a small perturbation of the vortex system. Moreover, the statistical distribution of vortex sizes determines the slope of the energy spectrum, which is steeper than that predicted by phenomenological theories.
The basic features concerning the implementation of the Hardy-Pomeau-De Pazzis (HPP) cellular automaton on a general purpose computer are illustrated, with explicit reference to the ibm 3090 vector processor. Special ...
详细信息
The basic features concerning the implementation of the Hardy-Pomeau-De Pazzis (HPP) cellular automaton on a general purpose computer are illustrated, with explicit reference to the ibm 3090 vector processor. Special attention is paid to the choice of a proper data organization allowing it to take full advantage of the high processing rates offered by a high speed vector processor like ibm 3090.
PVM (Parallel Virtual Machine) is currently a defacto standard in the world of distributed computing based on message passing [10]. PVMe is a proprietary realization of the PVM programming model which maintains the fu...
详细信息
PVM (Parallel Virtual Machine) is currently a defacto standard in the world of distributed computing based on message passing [10]. PVMe is a proprietary realization of the PVM programming model which maintains the full compliance with the original programming interface, but it is specially tuned to obtain higher performances on the SP2, the ibm parallel system based on the POWER 2 architecture and the AIX operating system. During the design and the development of PVMe we had a unique chance to gain clear indications about the aspects of the run-time system support more critical to achieve good results running applications by means of PVM. Hereafter we report about our experience and the (hopefully) useful lesson we learnt.
A low-resolution (64(3)) large-eddy model of forced homogeneous turbulence is numerically simulated using Kraichnan's eddy viscosity. The introduction of a reliable statistical estimate of the zeta(p) exponents al...
详细信息
A low-resolution (64(3)) large-eddy model of forced homogeneous turbulence is numerically simulated using Kraichnan's eddy viscosity. The introduction of a reliable statistical estimate of the zeta(p) exponents allows one to perform a detailed statistical analysis of the velocity field and shows that the probability distribution functions, the structure functions and the power-law exponents zeta(p) agree with previous numerical and experimental results obtained at much higher effective resolution. This result shows how a simple modelling of the energy transfer produces self-similar dynamics extending to the small scales and obtains the right statistical properties of the velocity field.
We present PVMe, ibm's AIX implementation of the widely used PVM message passing programming model. The focus is on the version for the ibm 9076 SP2. The basic PVMe design is described along with the results obtai...
ISBN:
(纸本)3540593934
We present PVMe, ibm's AIX implementation of the widely used PVM message passing programming model. The focus is on the version for the ibm 9076 SP2. The basic PVMe design is described along with the results obtained running some significant applications on the platform. A perspective on possible future developments of PVMe concludes the work.
暂无评论