A block parallel partitioning method for computing the eigenvalues of symmetric tridiagonal matrix is presented. The algorithm is based on partitioning, in a way that ensures load balance during computation. This meth...
详细信息
A block parallel partitioning method for computing the eigenvalues of symmetric tridiagonal matrix is presented. The algorithm is based on partitioning, in a way that ensures load balance during computation. This method is applicable to both shared memory- and distributed memory-MIMD systems. Compared with other parallel tridiagonal eigenvalue algorithms existing in the literature, the proposed algorithm achieves a higher speedup of O(p) on a parallel computer with p-fold parallelism, which is linear, and the data communication between processors is less than that required for other methods. The results were tested and evaluated on an MIMD machine, and were within 62% to 98% of the predicted performance.
Application of a method for performing direct simulation Monte Carlo calculations using parallel processing to several hypersonic, rarefied flow problems is presented. The performance and efficiency of the parallel me...
详细信息
Application of a method for performing direct simulation Monte Carlo calculations using parallel processing to several hypersonic, rarefied flow problems is presented. The performance and efficiency of the parallel method are discussed in terms of some simple benchmark problems. The applications described are the now in a channel and the flow about a flat plate at incidence. The benchmark results show significant advantages of parallel processing over conventional scalar processing and demonstrate the scalability of the method to large problems. The applications to hypersonic rarefied flows demonstrate the capabilities of the method, and the results demonstrate the need to adapt the parallel decomposition to the flowfield in order to achieve good load balancing.
This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algori...
详细信息
This paper considers four parallel Cholesky factorization algorithms, including SPOTRF from the February 1992 release of LAPACK, each of which call parallel Level 2 or 3 BLAS, or both. A fifth parallel Cholesky algorithm that calls serial Level 3 BLAS is also described. The efficiency of these five algorithms on the CRAY-2, CRAY Y-MP/832, Hitachi Data Systems EX 80, and IBM 3090-600J is evaluated and compared with a vendor-optimized parallel Cholesky factorization algorithm, The fifth parallel Cholesky algorithm that calls serial Level 3 BLAS provided the best performance of all algorithms that called BLAS routines. In fact, this algorithm outperformed the Cray-optimized libsci routine (SPOTRF) by 13-44 %, depending on the problem size and the number of processors used.
In this correspondence, we propose a one-pass parallel thinning algorithm based on a number of criteria including connectivity, unit-width convergence, medial axis approximation, noise immunity, and efficiency. A pipe...
详细信息
In this correspondence, we propose a one-pass parallel thinning algorithm based on a number of criteria including connectivity, unit-width convergence, medial axis approximation, noise immunity, and efficiency. A pipeline processing model is assumed for the development. Precise analysis of the thinning process is presented to show its properties, and proofs of skeletal connectivity and convergence are provided. The proposed algorithm is further extended to the derived-grid to attain an isotropic medial axis representation. A set of measures based on the desired properties of thinning is used for quantitative evaluation of various algorithms. Image reconstruction from connected skeletons is also discussed. Evaluation shows that our procedures compare favorably to others.
In this paper, we present a parallel algorithm for finding the smallest enclosing rectangle for a set of n points. The parallel algorithm can be generally implemented on mesh-connected and cube-connected SIMD computer...
详细信息
In this paper, we present a parallel algorithm for finding the smallest enclosing rectangle for a set of n points. The parallel algorithm can be generally implemented on mesh-connected and cube-connected SIMD computers with the time complexity O(square-root n) and O(log2 n) respectively.
We present in this paper a parallel scheduling based on the critical path heuristic for the 2-steps graph with constant task cost. This graph occurs in the parallelization of triangular linear system resolution. For a...
详细信息
We present in this paper a parallel scheduling based on the critical path heuristic for the 2-steps graph with constant task cost. This graph occurs in the parallelization of triangular linear system resolution. For a problem of size n and p processors (2 less-than-or-equal-to p less-than-or-equal-to n-1), we show the optimality of this scheduling.
A pair of parallel sequences {u(k)} --> u, {v(k)} --> 0 (1.3) is generated to solve the n x n linear system Au = (I(n) - B)u = f. Convergence depends only on the geometry or shape of sigma(B), the set of eigenva...
详细信息
A pair of parallel sequences {u(k)} --> u, {v(k)} --> 0 (1.3) is generated to solve the n x n linear system Au = (I(n) - B)u = f. Convergence depends only on the geometry or shape of sigma(B), the set of eigenvalues of B. The parallel method is applied to the singularly perturbed convection-diffusion equation (6.1), when the Reynolds number in the direction of flow is large. Numerical comparisons with known results are given. Our theory also applies to the class of possibly nonsymmetric A with real spectrum (cf. Theorem 5.1) and to several other classes of systems as well. Computations to generate the sequences are relatively straightforwaxd, as is indicated in our main result, Theorem 4.1. In fact, the parameters of the embracing ellipse for sigma(B)2 (4.6) completely determine (i) the coefficients for the parallel sequences {u(k)} --> u and {v(k)} --> 0 and (ii) the spectral radius (4.4), which characterizes their asymptotic convergence rate (2.4). Figure 5.1 illustrates some geometries for sigma(B) that are accommodated by our theory and Figure 7.1 shows the eigenvalue bowtie region arising from the convection-diffusion equation with large Reynolds number.
We continue our investigation of stochastic lattice gases as a highly parallel) means of simulating given PDEs, in this case Burgers' equation in one dimension The lattice dynamics consists of stochastic unidirect...
详细信息
We continue our investigation of stochastic lattice gases as a highly parallel) means of simulating given PDEs, in this case Burgers' equation in one dimension The lattice dynamics consists of stochastic unidirectional particle displacement, and our attention is turned toward the reliability of the model. i.e., its ability to reproduce the unique physical solution of Burgers' equation. Lattice gas results are discussed and compared against finite-difference calculations and exact solutions in examples which include shocks and rarefaction waves.
A distributed-memory parallel architecture known as a pipelined hypercube is introduced. A parallel sorting algorithm on this model is presented. Using p processors, n data items can be sorted with time complexity-the...
详细信息
A distributed-memory parallel architecture known as a pipelined hypercube is introduced. A parallel sorting algorithm on this model is presented. Using p processors, n data items can be sorted with time complexity-theta [GRAPHICS] including both communication and computation costs. The algorithm achieves linear speedup for 1 less-than-or-equal-to p less-than-or-equal-to(~)[GRAPHICS].
作者:
DAGUM, LNASA
AMES RES CTR MOFFETT FIELD CA 94035 USA
This paper presents the algorithms necessary for an efficient data parallel implementation of a three-dimensional particle simulation. A general master/slave algorithm and a fast sorting algorithm are described, and t...
详细信息
This paper presents the algorithms necessary for an efficient data parallel implementation of a three-dimensional particle simulation. A general master/slave algorithm and a fast sorting algorithm are described, and the use of these algorithms in a particle simulation is outlined. A particle simulation using these algorithms has been implemented on a 32768 processor connection machine that is capable of simulating over 30 million particles at an average rate of 2.4 mus/particle/step. Results are presented from the simulation of flow over an aeroassisted flight experiment (AFE) geometry at 100 km alt.
暂无评论