Gaussian elimination with partial pivoting achieved by adding the pivot row to the kth row at step k, was introduced by Onaga and Takechi in 1986 as a means for reducing communications in parallel implementations, In ...
详细信息
Gaussian elimination with partial pivoting achieved by adding the pivot row to the kth row at step k, was introduced by Onaga and Takechi in 1986 as a means for reducing communications in parallel implementations, In this paper it is shown that the growth factor of this partial pivoting algorithm is bounded above by mu (n) < 3(n-1), as compared to 2(n-1) for the standard partial pivoting. This bound (n), close to 3(n-2), is attainable for a class of near-singular matrices. Moreover, for the same matrices the growth factor is small under partial pivoting.
This article documents the first implementation of a parallel algorithm for solving the governing equations of the hydrodynamic formulation of quantum mechanics. The algorithm employs a quantum trajectory method (QTM)...
详细信息
This article documents the first implementation of a parallel algorithm for solving the governing equations of the hydrodynamic formulation of quantum mechanics. The algorithm employs a quantum trajectory method (QTM) based on the serial algorithm introduced by Wyatt and Lopreore. The OpenMP API is employed to parallelize the code across the quantum trajectories in a shared memory environment. An outline of the parallel algorithm is provided;the analytical solution for a moving free particle is used to verify the solution obtained by the parallel algorithm. Further validation against several of the results obtained by Wyatt and Lopreore is also provided. The parallel speedups and runtimes are presented, and several performance issues are noted. Finally, the results of a preliminary accuracy study examining a moving free particle are provided. (C) 2001 John Wiley & Sons, Inc.
This article documents the first implementation of a parallel algorithm for solving the governing equations of the hydrodynamic formulation of quantum mechanics. The algorithm employs a quantum trajectory method (QTM)...
详细信息
This article documents the first implementation of a parallel algorithm for solving the governing equations of the hydrodynamic formulation of quantum mechanics. The algorithm employs a quantum trajectory method (QTM) based on the serial algorithm introduced by Wyatt and Lopreore. The OpenMP API is employed to parallelize the code across the quantum trajectories in a shared memory environment. An outline of the parallel algorithm is provided;the analytical solution for a moving free particle is used to verify the solution obtained by the parallel algorithm. Further validation against several of the results obtained by Wyatt and Lopreore is also provided. The parallel speedups and runtimes are presented, and several performance issues are noted. Finally, the results of a preliminary accuracy study examining a moving free particle are provided. (C) 2001 John Wiley & Sons, Inc.
In this paper, we study the paxallelization of the one-sided Jacobi method for computing the eigenvalues and the eigenvectors of a real and symmetric matrix. We use a technique to overlap the communications by the com...
详细信息
ISBN:
(纸本)3540422935
In this paper, we study the paxallelization of the one-sided Jacobi method for computing the eigenvalues and the eigenvectors of a real and symmetric matrix. We use a technique to overlap the communications by the computations in order to decrease the global communication time. We also extend the obtained results to the block version for using the level-3 BLAS.
Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting...
详细信息
Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting mechanisms have been proposed to enhance the communication efficiency of MCCs. One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). We propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). We compare IMMBs with MMBs and a class of previously proposed generalized MMBs (GMMBs). We show the power of IMMBs by considering semigroup and prefix computations. Specifically, as our main result we show that for any constant 0 < < 1, one can construct an N-1/2 x N-1/2 square IMMB using which semigroup and prefix computations on N operands can be carried out in O(N-) time, while maintaining 0(1) broadcasting time. Compared with the previous best complexities O(N-1/6) and O(N-1/16) achieved on a rectangular MMB and GMMB, respectively, for the same computations, our results show that IMMBs are more powerful than MMBs and GMMBs.
Air pollution models can efficiently be used in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be transported over very long distances. There...
详细信息
ISBN:
(纸本)3540430431
Air pollution models can efficiently be used in different environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants can be transported over very long distances. Therefore the models must be defined on a large space domain. Moreover, all relevant physical and chemical processes must be adequately described. This leads to huge computational tasks. That is why it is difficult to handle numerically such models even on the most powerful up-to-date supercomputers. The particular model used in this study is the Danish Eulerian Model. The numerical methods used in the advection-diffusion part of this model consist of finite elements (for discretizing the spatial derivatives) followed by predictor-corrector schemes with several different correctors (in the numerical treatment of the resulting systems of ordinary differential equations). Implicit methods for the solution of stiff systems of ordinary differential equations are used in the chemistry part. This implies the use of Newton-like iterative methods. A special sparse matrix technique is applied in order to increase the efficiency. The model is constantly updated with new faster and more accurate numerical methods. The three-dimensional version of the Danish Eulerian Model is presented in this work. The model is defined on a space domain of 4800 km x 4800 km that covers the whole of Europe together with parts of Asia, Africa and the Atlantic Ocean. A chemical scheme with 35 species is used in this version. Two parallel implementations are discussed;the first one for shared memory parallel computers, the second one - the newly developed version for distributed memory computers. Standard tools are used to achieve parallelism: OpenMP for shared memory computers and MPI for distributed memory computers. Results from many experiments, which were carried out on a SUN SMP cluster and on a CRAY T3E at the Edinburgh parallel Computer Centre (EPCC), are presented and analyzed.
This paper introduces a new approach based on the geographic nearest neighbors for constructing the Delaunay triangulation (a dual of the Voronoi diagram) of a aet of n sites in the plane under the L-1 metric. In gene...
详细信息
This paper introduces a new approach based on the geographic nearest neighbors for constructing the Delaunay triangulation (a dual of the Voronoi diagram) of a aet of n sites in the plane under the L-1 metric. In general, there is no inclusion relationship between the Delaunay triangulation and the octant neighbor graph. We however find that under the L1 metric the octant neighbor graph contains at least one edge of each triangle in the Delaunay triangulation. By using this observation and employing a range tree scheme, we design an algorithm for constructing the Delaunay triangulation (thus the Voronoi diagram) in the L1 metric. This algorithm takes O(n log n) sequential time for constructing the Delaunay triangulation in the L1 metric. This algorithm can easily be parallelized, and takes O(log n) time with O(n) processors on a CREW-PRAM.
Consider an input sequence of 2(n) data and an n-dimensional hypercube, H-n, with n - 1 faulty nodes. This short paper presents an efficient fault-tolerant algorithm for Fast Fourier Transform (FFT) with these 2(n) da...
详细信息
Consider an input sequence of 2(n) data and an n-dimensional hypercube, H-n, with n - 1 faulty nodes. This short paper presents an efficient fault-tolerant algorithm for Fast Fourier Transform (FFT) with these 2(n) data on the faulty H-n in 9n - 15 communication steps and O(n) computation steps. To the best of our knowledge, this is the first time that such a fault-tolerant algorithm for FFT on hypercubes is proposed in the literature. (C) 2001 Elsevier Science B.V. All rights reserved.
Consider M unsorted elements and an n-dimensional hypercube H-n with [3n/2] - 1 faulty nodes, where M much greater than N = 2(n). Employing a newly proposed partition strategy and the light-occupied dimension concept,...
详细信息
Consider M unsorted elements and an n-dimensional hypercube H-n with [3n/2] - 1 faulty nodes, where M much greater than N = 2(n). Employing a newly proposed partition strategy and the light-occupied dimension concept, this paper improves Sheu et al.'s algorithm [Sheu, Chen, Chang, J. parallel Distributed Comput, 16 (1992) 185] for sorting these M unsorted elements on the faulty H-n. With the same time bound O((M/N)log(M/N) + (M/N)log(2) N) as [Sheu et al., 1992], the proposed algorithm can tolerate [n/2] more faulty nodes than Sheu et al.'s algorithm which can tolerate at most n -1 faulty nodes. (C) 2001 Elsevier Science B.V. All rights reserved.
This paper studies the parallelization of a recursive algorithm for triangular matrix inversion (TMI), using the "divide and conquer" paradigm. For a (large scale) matrix of size n = m2(k) (m,k greater than ...
详细信息
This paper studies the parallelization of a recursive algorithm for triangular matrix inversion (TMI), using the "divide and conquer" paradigm. For a (large scale) matrix of size n = m2(k) (m,k greater than or equal to 1) and p = 2(q) (less than or equal to n/2) available processors, we first construct an adequate 2-phases task segmentation and inducing a balanced layered task graph. Then. we design a greedy scheduling leading to a cost optimal parallel algorithm, i.e. whose efficiency is equal to 1 for large n. The practical interest of the contribution is proven through an experimental study of two versions of the original algorithm on an IBM SP1 distributed memory multiprocessor. (C) 2001 Published by Elsevier Science B.V.
暂无评论