We implemented our gauge-including atomic orbital (GIAO) NMR chemical shielding program on a workstation cluster, using the parallel virtual machine (PVM) message-passing system. On a modest number of nodes, we achiev...
详细信息
We implemented our gauge-including atomic orbital (GIAO) NMR chemical shielding program on a workstation cluster, using the parallel virtual machine (PVM) message-passing system. On a modest number of nodes, we achieved close to Linear speedup. This program is characterized by several novel features. It uses the new integral program of Wolinski that calculates integrals in vectorized batches, increases efficiency, and simplifies parallelization. The self-consistent field (SCF) step includes a multi-Fock algorithm, i.e., the simultaneous calculation of several Fock matrices with the same integral set, increasing the efficiency of the direct SCF procedure. The SCF diagonalization step, which is difficult to parallelize, has been replaced by pseudo-diagonalization. The latter, widely used in semiempirical programs, becomes important in ab initio type calculations above a certain size, because the ultimate scaling of the diagonalization step is steeper than that of integral computation. Examples of the calculation of the NMR shieldings in large systems at the SCF level are shown. parallelization of the density functional code is underway. (C) 1997 by John Wiley & Sons, Inc.
The solution of large-scale chemical process simulation and optimization problems using parallel computation requires algorithms that can take advantage of multiprocessing when solving the large, sparse matrices that ...
详细信息
The solution of large-scale chemical process simulation and optimization problems using parallel computation requires algorithms that can take advantage of multiprocessing when solving the large, sparse matrices that arise. parallel algorithms require that the matrices be partitioned in order to distribute computational work across processors. One way to accomplish this is to reorder the matrix into a bordered block-diagonal form. Since this structure is not always obtained from the equation generation routine, an algorithm to reorder the rows and columns of the coefficient matrix is needed. We describe here a simple graph partitioning algorithm that creates a bordered block-diagonal form that is suitable for use with parallel algorithms for the solution of the highly asymmetric sparse matrices arising in process engineering applications. The method aims to create a number of similarly sized diagonal blocks while keeping the size of the interface matrix, which may represent a bottleneck in the parallel computation, reasonably small. Results on a wide range of test problems indicate that the reordering algorithm is able to find such a structure in most cases, and requires much less reordering time than previously used graph partitioning methods. (C) 1999 Elsevier Science Ltd. All rights reserved.
In this paper compute unified device architecture programming and open multiprocessing are used for the graphics processing unit and central processing unit parallel computation of material damage. The material damage...
详细信息
In this paper compute unified device architecture programming and open multiprocessing are used for the graphics processing unit and central processing unit parallel computation of material damage. The material damage is evaluated by a multilevel finite element analysis within material domains reconstructed from a high-resolution micro-focus X-ray computed tomography system. An effective computational method is investigated for solving the linear equations of finite element analysis. Numerical results show an encouraging trend in reducing the computation cost for the digital diagnosis of material damage.
A numerical calculation scheme for the multicenter problem in large molecules and clusters is presented by applying the message-passing inter-face (MPI) in a massively parallel computer that uses the density functiona...
详细信息
A numerical calculation scheme for the multicenter problem in large molecules and clusters is presented by applying the message-passing inter-face (MPI) in a massively parallel computer that uses the density functional method. The multicenter problem associated with the Coulomb singularity of an atom is efficiently treated by the parallel processors by allocating several atoms into each processor element (PE). The order N-2/P tuning is obtained for the Coulomb energy calculation by using the MPI which transfers Coulomb potential field between PE's. This method is applied to estimate the total energy of the reconstructed Al/Pd bimetallic surface. The energy estimation by the charge density of a superposition of isolated atomic charge fragments predict a stabilization caused by the reconstruction, being consistent with a self-consistent-field (SCF) cluster calculation of the bimetallic surface.
A parallel implementation of vortex methods dealing with unsteady viscous flows on a distributed computing environment through parallel Virtual Machine (PVM) is reported in this paper. We test the recently developed d...
详细信息
A parallel implementation of vortex methods dealing with unsteady viscous flows on a distributed computing environment through parallel Virtual Machine (PVM) is reported in this paper. We test the recently developed diffusion schemes of vortex methods. We directly compare the particle strength exchange method with the vorticity distribution method in terms of their accuracy and computational efficiency. Comparisons between both viscous models described are presented for the impulsively started flows past a circular cylinder at Reynolds number 60. We also present the comparisons of both methods in their parallel computation efficiency and speed-up ratio.
Shared memory models of parallel computation (e.g., parallel RAMs) that allow simultaneous read/write access are very natural and already widely used for parallel algorithm design. The various models differ from each ...
详细信息
Shared memory models of parallel computation (e.g., parallel RAMs) that allow simultaneous read/write access are very natural and already widely used for parallel algorithm design. The various models differ from each other in the mechanism by which they resolve write conflicts. To understand the effect of these communication primitives on the power of parallelism, we extensively study the relationship between four such models that appear in the literature, and prove nontrivial separations and simulation results among them.
The topology of interconnection networks plays a key role in the performance of parallel computing systems. A new interconnection network called exchanged crossed cube (ECQ) is proposed and analyzed in this paper. We ...
详细信息
The topology of interconnection networks plays a key role in the performance of parallel computing systems. A new interconnection network called exchanged crossed cube (ECQ) is proposed and analyzed in this paper. We prove that ECQ has the better properties than other variations of the basic hypercube in terms of the smaller diameter, fewer links, and lower cost factor, which indicates the reduced communication overhead, lower hardware cost, and more balanced consideration among performance and cost. Furthermore, it maintains several attractive advantages including recursive structure, high partitionability, and strong connectivity. Furthermore, the optimal routing and broadcasting algorithms are proposed for this new network topology.
This paper presents the development and validation of a parallel unstructured-grid fluid-structure interaction (FSI) solver for the simulation of unsteady incompressible viscous flow with long elastic moving and compl...
详细信息
This paper presents the development and validation of a parallel unstructured-grid fluid-structure interaction (FSI) solver for the simulation of unsteady incompressible viscous flow with long elastic moving and compliant boundaries. The Navier-Stokes solver on unstructured moving grid using the arbitrary Lagrangian Eulerian formulation is based on the artificial compressibility approach and a high-order characteristics-based finite-volume scheme. Both unsteady flow and FSI are calculated with a matrix-free implicit dual time-stepping scheme. A membrane model has been formulated to study fluid flow in a channel with an elastic membrane wall and their interactions. This model can be employed to calculate arbitrary wall movement and variable tension along the membrane, together with a dynamic mesh method for large deformation of the flow field. The parallelization of the fluid-structure solver is achieved using the single program multiple data programming paradigm and message passing interface for communication of data. The parallel solver is used to simulate fluid flow in a two-dimensional channel with and without moving membrane for validation and performance evaluation purposes. The speedups and parallel efficiencies obtained by this method are excellent, using up to 16 processors on a SGI Origin 2000 parallel computer. A maximum speedup of 23.14 could be achieved on 16 processors taking advantage of an improved handling of the membrane solver. The parallel results obtained are compared with those using serial code and they are found to be identical. Copyright (c) 2005 John Wiley & Sons, Ltd.
Flood risk assessment is the fundamental work of flood risk management and important decision-making basis for essential flood mitigation, and it is an attractive and difficult problem with more requirements on conven...
详细信息
Flood risk assessment is the fundamental work of flood risk management and important decision-making basis for essential flood mitigation, and it is an attractive and difficult problem with more requirements on convenience, effectiveness and timeliness. Specifically, the uncertainty and nonlinear relation between assessment indices and evaluation levels are always difficult to be revealed, and it is not easy to calculate the weight of assessment indices by subjective judgment and objective properties. Moreover, reducing the total computational time for rapid flood risk map application is rarely studied. On the basis of cloud model (CM), game theory (GT) and parallel computation technology (PC), a new model named P-CM-GT for fast comprehensive flood risk assessment was presented, which has three advantages, i.e. firstly, it could describe the fuzziness randomness of membership degree via CM;secondly, the combination weight integrating with different weights is employed via GC;thirdly, the computation process of CM and GT is combined with PC to reduce the running time. Finally, taking a case study on fast comprehensive flood risk assessment of Hubei Province in China, the flood risk grades were achieved with less time, and the results were appropriately consistent with the actual situation, and the future flood control focus is to set up a wholesome and effective emergency plan. Moreover, the proposed model is feasible, effective, fast and applicable, thus give out a novel thinking for fast flood risk management.
The Open Accelerator (OpenACC) application programming interface is a relatively new parallel computing standard. In this paper, particle-based flow field simulations are examined as a case study of OpenACC parallel c...
详细信息
The Open Accelerator (OpenACC) application programming interface is a relatively new parallel computing standard. In this paper, particle-based flow field simulations are examined as a case study of OpenACC parallel computation. The parallel conversion process of the OpenACC standard is explained, and further, the performance of the flow field parallel model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the parallel algorithm, a speedup factor of 18.26x is possible. In contrast, a speedup factor of just 11.77x was achieved with the conventional Open Multi-Processing (OpenMP) parallel mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC parallel mode is found to have good portability, making it easy to implement parallel computation from the original serial model.
暂无评论