Three parallelalgorithms, namely the parallel partition LU (PPT) algorithm, the parallel partition hybrid (PPH) algorithm, and the parallel diagonal dominant (PDD) algorithm are proposed for solving tridiagonal linea...
详细信息
Three parallelalgorithms, namely the parallel partition LU (PPT) algorithm, the parallel partition hybrid (PPH) algorithm, and the parallel diagonal dominant (PDD) algorithm are proposed for solving tridiagonal linear systems on multicomputers. These algorithms are based on the divide-and-conquer parallel computation model. The PPT and PPH algorithms support both pivoting and nonpivoting. The PPT algorithm is good when the number of processors is small;otherwise, the PPH algorithm is better. When the system is diagonal dominant, the PDD algorithm is highly parallel and provides an approximate solution which equals to the exact solution within machine accuracy. Both computation and communication complexities of the three algorithms are presented. All three methods proposed in this paper have been implemented on a 64-node nCUBE-1 multicomputer. The analytic results match closely with the results measured from the nCUBE-1 machine.
A flexible parallel deterministic solver of the Boltzmann-Poisson system for 2D semiconductor device simulation on computer clusters is presented. The simulator is obtained by parallelizing a previously proposed numer...
详细信息
A flexible parallel deterministic solver of the Boltzmann-Poisson system for 2D semiconductor device simulation on computer clusters is presented. The simulator is obtained by parallelizing a previously proposed numerical scheme based on high order finite difference weighted essentially non-oscillatory (WENO) schemes. Although the underlying numerical scheme presents important advantages over direct simulation Monte Carlo methods, this scheme imposes very high demands of computing power. Due to this, the parallelization of the different calculation phases in the numerical scheme has been tackled. The data subdomain which demands most of the computational workload has been suitably distributed among the processors and several parallel design decisions has been taken in order to achieve good performance. Moreover, the resultant parallel application can be easily adjusted to simulate a wide range of devices and could be easily used by engineers without mathematical background about the underlying numerical scheme. The parallel algorithm has been implemented in C++ augmented with calls to MPI functions and functions of optimized linear algebra libraries. Several experiments have been performed by simulating particular MOSFET and DG-MOSFET devices on a SMP cluster in order to show its efficiency. (C) 2008 Elsevier B.V. All rights reserved.
In this paper we discuss numerical methods and algorithms for the solution of NLTE stellar atmosphere problems involving expanding atmospheres, e.g., found in novae, supernovae and stellar winds. We show how a scheme ...
详细信息
In this paper we discuss numerical methods and algorithms for the solution of NLTE stellar atmosphere problems involving expanding atmospheres, e.g., found in novae, supernovae and stellar winds. We show how a scheme of nested iterations can be used to reduce the high dimension of the problem to a number of problems with smaller dimensions. As examples of these sub-problems, we discuss the numerical solution of the radiative transfer equation for relativistically expanding media with spherical symmetry, the solution of the multi-level nonLTE statistical equilibrium problem for extremely large model atoms, and our temperature correction procedure. Although modern iteration schemes are very efficient, parallelalgorithms are essential in making large-scale calculations feasible, therefore we discuss some parallelization schemes that we have developed. (C) 1999 Elsevier Science B.V. All rights reserved.
We investigate several iterative numerical schemes for nonlinear variational image smoothing and segmentation implemented in parallel, A general iterative framework subsuming these schemes is suggested for which globa...
详细信息
We investigate several iterative numerical schemes for nonlinear variational image smoothing and segmentation implemented in parallel, A general iterative framework subsuming these schemes is suggested for which global convergence irrespective of the starting point can be shown. We characterize various edge-preserving regularization methods from the recent image processing literature involving auxiliary variables as special cases of this general framework. As a by-product, global convergence can be proven under conditions slightly weaker than those stated in the literature. Efficient Krylov subspace solvers for the linear parts of these schemes have been implemented on a multi-processor machine. The performance of these parallel implementations has been assessed and empirical results concerning convergence rates and speed-up factors are reported.
We present a new parallel implementation of the Gauss-Seidel iteration for solving systems of linear equations, improving the results presented in two recent papers.
We present a new parallel implementation of the Gauss-Seidel iteration for solving systems of linear equations, improving the results presented in two recent papers.
In this note we improve results presented in the paper: N.M. Missirlis, Scheduling parallel iterative methods on multiprocessor systems, parallel Computing 5 (1987) 295–302.
In this note we improve results presented in the paper: N.M. Missirlis, Scheduling parallel iterative methods on multiprocessor systems, parallel Computing 5 (1987) 295–302.
Dimensional analysis reduces a complicated ten-parameter formula for the execution time of the Linpack benchmark to a simpler two-parameter formula. These two parameters are ratios of software forces and hardware forc...
详细信息
Dimensional analysis reduces a complicated ten-parameter formula for the execution time of the Linpack benchmark to a simpler two-parameter formula. These two parameters are ratios of software forces and hardware forces that determine a self-similarity Surface. Machines move along paths on this surface as the problem size and the number of processors change. Two machines scale the same way, they move along the same path, if they have the same hardware forces. To design efficient algorithms, the programmer must produce software forces large enough to overcome the hardware forces. Modern machines have larger hardware forces than older machines and are harder to program. (C) 2008 Elsevier Inc. All rights reserved.
We describe and test a software approach to fault detection in common numericalalgorithms. Such result checking or algorithm-based fault tolerance (ABFT) methods may be used, for example, to overcome single-event ups...
详细信息
We describe and test a software approach to fault detection in common numericalalgorithms. Such result checking or algorithm-based fault tolerance (ABFT) methods may be used, for example, to overcome single-event upsets in computational hardware or to detect errors in complex, high-efficiency implementations of the algorithms. Following earlier work, we use checksum methods to validate results returned by a numerical subroutine operating subject to unpredictable errors in data. We consider common matrix and Fourier algorithms which return results satisfying a necessary condition having a linear form;the checksum tests compliance with this condition. We discuss the theory and practice of setting numerical tolerances to separate errors caused by a fault from those inherent in finite-precision floating-point calculations. We concentrate on comprehensively defining and evaluating tests having various accuracy/computational burden tradeoffs, and we emphasize average-case algorithm behavior rather than using worst-case upper bounds on error.
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the seri...
详细信息
The paper describes the implementation of the Successive Overrelaxation (SOR) method on an asynchronous multiprocessor computer for solving large, linear systems. The parallel algorithm is derived by dividing the serial SOR method into noninterfering tasks which are then combined with an optimal schedule of a feasible number of processors. The important features of the algorithm are: (i) achieves a speedup Sp ? O(N/3) and an efficiency Ep ? 2/3 using p = [N/2] processors, where N is the number of the equations, (ii) contains a high level of inherent parallelism, whereas on the other hand, the convergence theory of the parallel SOR method is the same as its sequential counterpart and (iii) may be modified to use block methods in order to minimise the overhead due to communication and synchronisation of the processors.
The SIMPLE program is a commonly used benchmark for testing new architectures designed for high speed scientific computation. As the name implies, the code is a simple example of a Lagrangian hydrodynamics application...
详细信息
The SIMPLE program is a commonly used benchmark for testing new architectures designed for high speed scientific computation. As the name implies, the code is a simple example of a Lagrangian hydrodynamics application. In this paper we describe the SIMPLE benchmark in detail and discuss the way in which parallelism can be used to speed up execution. The focus of the work is a mapping of the algorithms to a configurable highly parallel (CHiP) computer being designed at the University of Washington.
暂无评论