A simple model is described, which can be used to estimate execution times of vectorized FORTRAN loops on the ibm 3090/VF. Although a number of simplifying assumptions are made in the model (especially concerning the ...
详细信息
A simple model is described, which can be used to estimate execution times of vectorized FORTRAN loops on the ibm 3090/VF. Although a number of simplifying assumptions are made in the model (especially concerning the modeling of the cache), the resulting estimates are roughly correct in many cases. In situations where this model is oversimplified or unrealistic, it can still be used as a starting point for more sophisticated performance predictions.
By high-resolution numerical integration of two-dimensional Navier-Stokes equations we show that the turbulent flow at high Reynolds number is dominated by a simple and weakly unstable Hamiltonian system of pointlike ...
详细信息
By high-resolution numerical integration of two-dimensional Navier-Stokes equations we show that the turbulent flow at high Reynolds number is dominated by a simple and weakly unstable Hamiltonian system of pointlike vortices. The large instabilities, typical of the turbulent flow, are found uniquely outside vortices, in the wide dissipative region which results to be only a small perturbation of the vortex system. Moreover, the statistical distribution of vortex sizes determines the slope of the energy spectrum, which is steeper than that predicted by phenomenological theories.
One of the most challenging problems in fluid dynamics is understanding the properties of turbulent flows. The advent of large supercomputers permits the investigation of turbulence with great accuracy in two dimensio...
详细信息
One of the most challenging problems in fluid dynamics is understanding the properties of turbulent flows. The advent of large supercomputers permits the investigation of turbulence with great accuracy in two dimensions, but full three-dimensional problems are physically more complex and their study is currently limited to the case of simple flows. It is shown that the availability of a continuous time-dependent representation of the dynamics of fluid flows can quickly lead to more complete understanding of the many concurrent physical mechanisms ruling turbulence. Some significant examples show how an analog videotape, obtained from direct computer simulations of fluid flows, suggests physical results that can later be obtained through a mathematical analysis of the numerical simulations.
The basic features concerning the implementation of the Hardy-Pomeau-De Pazzis (HPP) cellular automaton on a general purpose computer are illustrated, with explicit reference to the ibm 3090 vector processor. Special ...
详细信息
The basic features concerning the implementation of the Hardy-Pomeau-De Pazzis (HPP) cellular automaton on a general purpose computer are illustrated, with explicit reference to the ibm 3090 vector processor. Special attention is paid to the choice of a proper data organization allowing it to take full advantage of the high processing rates offered by a high speed vector processor like ibm 3090.
This paper presents some performance improvements introduced in ADINA-F for the ibm 3090 vector-multiprocessor, to run large model fluid-flow analyses. These improvements have been achieved by means of a new skyline s...
详细信息
This paper presents some performance improvements introduced in ADINA-F for the ibm 3090 vector-multiprocessor, to run large model fluid-flow analyses. These improvements have been achieved by means of a new skyline solver, the vectorization and parallelization of the assembly and the optimization of the I/O through the usage of data sets allocated on virtual storage instead of disks. The skyline solver is based on the blocked approach proposed by the level 3 BLAS (Basic Linear Algebra Subprograms). This solver achieves high performance due to vectorization, parallelization and optimized access of data stored in high speed memories. The maximum speed-up measured in this phase is about 25. The assembly has been vectorized and parallelized with a local speed-up of about 11. The optimization of the I/O appears a crucial point in order to achieve a very satisfactory overall reduction in the computational time. The use of the virtual input/output facility leads to a local speed-up of up to 18. The exploitation of some of the features of the ibm 3090 allows one to very efficiently run complex industrial problems. The time scale passes from several hours to a few minutes, the global job speed-up being up to 19.
A low-resolution (64(3)) large-eddy model of forced homogeneous turbulence is numerically simulated using Kraichnan's eddy viscosity. The introduction of a reliable statistical estimate of the zeta(p) exponents al...
详细信息
A low-resolution (64(3)) large-eddy model of forced homogeneous turbulence is numerically simulated using Kraichnan's eddy viscosity. The introduction of a reliable statistical estimate of the zeta(p) exponents allows one to perform a detailed statistical analysis of the velocity field and shows that the probability distribution functions, the structure functions and the power-law exponents zeta(p) agree with previous numerical and experimental results obtained at much higher effective resolution. This result shows how a simple modelling of the energy transfer produces self-similar dynamics extending to the small scales and obtains the right statistical properties of the velocity field.
We present PVMe, ibm's AIX implementation of the widely used PVM message passing programming model. The focus is on the version for the ibm 9076 SP2. The basic PVMe design is described along with the results obtai...
ISBN:
(纸本)3540593934
We present PVMe, ibm's AIX implementation of the widely used PVM message passing programming model. The focus is on the version for the ibm 9076 SP2. The basic PVMe design is described along with the results obtained running some significant applications on the platform. A perspective on possible future developments of PVMe concludes the work.
作者:
BRISCOLINI, MIBM ECSEC
European Center for Scientific and Engineering Computing P.le Giulio Pastore 6 00144 Roma Italy
The paper reports on two parallel message passing implementations of a 3-D pseudospectral based code on top of the PVMe [1] and the MPL Parallel Environment [2], both parallel interfaces tuned for the ibm's scalab...
详细信息
The paper reports on two parallel message passing implementations of a 3-D pseudospectral based code on top of the PVMe [1] and the MPL Parallel Environment [2], both parallel interfaces tuned for the ibm's scalable parallel system. Specifically, the discussion underlines details of high performing real-to-complex and complex-to-real parallel 3-D FFTs, the two computational kernels in the 3-D pseudospectral integration method, on the base of parallel performances measured on an SP1 16 nodes and on two SP2 16 thin and 16 wide nodes. At the end, some considerations about computational capabilities of the ibm's parallel distributed memory architecture in performing high intensive numerical simulations of 3-D homogeneous turbulent flows are also addressed.
A new method is developed to solve 2-dimensional time dependent incompressible viscous flows in an arbitrary geometry. The Navier-Stokes equations are solved in primitive variables using a pseudospectral formulation. ...
A new method is developed to solve 2-dimensional time dependent incompressible viscous flows in an arbitrary geometry. The Navier-Stokes equations are solved in primitive variables using a pseudospectral formulation. The geometry is introduced by directly forcing the velocity field near the boundaries. The method extends a similar computational technique introduced by Basdevant and Sadourny [1]. Some examples are shown to evaluate its efficiency and accuracy.
The demand for very high computing performance has become increasingly common in many scientific and engineering environments. In addition to vector processing, parallel computing is now considered a useful way to enh...
详细信息
The demand for very high computing performance has become increasingly common in many scientific and engineering environments. In addition to vector processing, parallel computing is now considered a useful way to enhance performance. However, parallel computing tends to be unpopular among users because, with presently available technology and software, it requires explicit programmer intervention to exploit architectural parallelism. This intervention can be minor in some cases, but it often requires a nonnegligible amount of program restructuring, or even a reformulation of some of the algorithms used. In addition, it makes program debugging considerably more difficult. Tools for interprocedural program analysis, able to analyze at high levels large FORTRAN programs composed of many subroutines and to perform an automatic high-level parallel decomposition, are being developed at ibm. As an additional possibility, low-level parallelism could also be exploited automatically. This would greatly simplify the program analysis needed in a compiler in order to automatically insert parallel constructs in a program. However, this could only be efficient in an environment in which the cost for scheduling and synchronizing parallel activities is extremely small. Such a “microtasking” environment has been realized, at a prototype level, for ibm multiprocessors and is described in this paper. Its applicability and potential are illustrated with the help of some simple examples. These examples show that, with microtasking, good processor utilization and useful speed-ups are achieved even for fine-grain problems. The user interface of the prototype is described, both at the FORTRAN and at the Assembler level, and the possibility of incorporating this environment into a FORTRAN compiler is discussed.
暂无评论