A flexible parallel deterministic solver of the Boltzmann-Poisson system for 2D semiconductor device simulation on computer clusters is presented. The simulator is obtained by parallelizing a previously proposed numer...
详细信息
A flexible parallel deterministic solver of the Boltzmann-Poisson system for 2D semiconductor device simulation on computer clusters is presented. The simulator is obtained by parallelizing a previously proposed numerical scheme based on high order finite difference weighted essentially non-oscillatory (WENO) schemes. Although the underlying numerical scheme presents important advantages over direct simulation Monte Carlo methods, this scheme imposes very high demands of computing power. Due to this, the parallelization of the different calculation phases in the numerical scheme has been tackled. The data subdomain which demands most of the computational workload has been suitably distributed among the processors and several parallel design decisions has been taken in order to achieve good performance. Moreover, the resultant parallel application can be easily adjusted to simulate a wide range of devices and could be easily used by engineers without mathematical background about the underlying numerical scheme. The parallel algorithm has been implemented in C++ augmented with calls to MPI functions and functions of optimized linear algebra libraries. Several experiments have been performed by simulating particular MOSFET and DG-MOSFET devices on a SMP cluster in order to show its efficiency. (C) 2008 Elsevier B.V. All rights reserved.
With the emergence of new massively parallel systems in the high performance computing area allowing scientific simulations to run on thousands of processors, the mean time between failures of large machines is decrea...
详细信息
With the emergence of new massively parallel systems in the high performance computing area allowing scientific simulations to run on thousands of processors, the mean time between failures of large machines is decreasing from several weeks to a few minutes. The ability of hardware and software components to handle these singular events called process failures is therefore getting increasingly important. In order for a scientific code to continue despite a process failure, the application must be able to retrieve the lost data items. The recovery procedure after failures might be fairly straightforward for elliptic and linear hyperbolic problems. However, the reversibility in time for parabolic problems appears to be the most challenging part because it is an ill-posed problem. This paper focuses on new fault-tolerant numerical schemes for the time integration of parabolic problems. The new algorithm allows the application to recover from process failures and to reconstruct numerically the lost data of the failed process(es) avoiding the expensive roll-back operation required in most checkpoin/restart schemes. As a fault tolerant communication library, we use the fault tolerant message passing interface developed by the Innovative Computing Laboratory at the University of Tennessee. Experimental results show promising performances. Indeed, the three-dimensional parabolic benchmark code is able to recover and to keep on running after failures, adding only a very small penalty to the overall time of execution. (C) 2007 Elsevier Inc. All rights reserved.
Dimensional analysis reduces a complicated ten-parameter formula for the execution time of the Linpack benchmark to a simpler two-parameter formula. These two parameters are ratios of software forces and hardware forc...
详细信息
Dimensional analysis reduces a complicated ten-parameter formula for the execution time of the Linpack benchmark to a simpler two-parameter formula. These two parameters are ratios of software forces and hardware forces that determine a self-similarity Surface. Machines move along paths on this surface as the problem size and the number of processors change. Two machines scale the same way, they move along the same path, if they have the same hardware forces. To design efficient algorithms, the programmer must produce software forces large enough to overcome the hardware forces. Modern machines have larger hardware forces than older machines and are harder to program. (C) 2008 Elsevier Inc. All rights reserved.
We apply dimensional analysis to a formula for execution time for a QR algorithm from a paper by Henry and van de Geijn. We define a single efficiency surface that reduces performance analysis for this algorithm to an...
详细信息
ISBN:
(纸本)9783540681052
We apply dimensional analysis to a formula for execution time for a QR algorithm from a paper by Henry and van de Geijn. We define a single efficiency surface that reduces performance analysis for this algorithm to an exercise in differential geometry. As the problem size and the number of processors change, different machines move along different paths on the surface determined by two computational forces specific to each machine. We show that computational force, also called computational intensity, is a unifying concept for understanding the performance of parallel numerical algorithms.
This paper describes an efficient and robust hybrid parallel solver "the SPIKE algorithm" for narrow-banded linear systems. Two versions of SPIKE with their built-in-options are described in detail: the Recu...
详细信息
This paper describes an efficient and robust hybrid parallel solver "the SPIKE algorithm" for narrow-banded linear systems. Two versions of SPIKE with their built-in-options are described in detail: the Recursive SPIKE version for handling non-diagonally dominant systems and the Truncated SPIKE version for diagonally dominant ones. These SPIKE schemes can be used either as direct solvers, or as preconditioners for outer iterative schemes. Both versions are faster than the direct solvers in ScaLAPACK on parallel computing platforms, and quite competitive in terms of achieved accuracy For handling systems that are dense within the band. (c) 2005 Elsevier B.V. All rights reserved.
This paper describes an efficient and robust hybrid parallel solver "the SPIKE algorithm" for narrow-banded linear systems. Two versions of SPIKE with their built-in-options are described in detail: the Recu...
详细信息
This paper describes an efficient and robust hybrid parallel solver "the SPIKE algorithm" for narrow-banded linear systems. Two versions of SPIKE with their built-in-options are described in detail: the Recursive SPIKE version for handling non-diagonally dominant systems and the Truncated SPIKE version for diagonally dominant ones. These SPIKE schemes can be used either as direct solvers, or as preconditioners for outer iterative schemes. Both versions are faster than the direct solvers in ScaLAPACK on parallel computing platforms, and quite competitive in terms of achieved accuracy For handling systems that are dense within the band. (c) 2005 Elsevier B.V. All rights reserved.
This paper deals with some aspects of performance of the symmetric successive over-relaxation preconditioner in a distributed environment. The details of distributed formulation of the preconditioner are presented. So...
详细信息
This paper deals with some aspects of performance of the symmetric successive over-relaxation preconditioner in a distributed environment. The details of distributed formulation of the preconditioner are presented. Some performance metrics are compared and discussed for the message passing interface implementation of the algorithm. The properties of the solver are estimated for concurrent three-dimensional formulation of the finite-element time-domain method. The analyzed benchmark models are approximated by tetrahedral first order Whitney elements.
Large-scale computational problems are encountered when one attempts to realize high degree of detail and realism in the simulation of quantum transport in nanodevices. These problems can be addressed using novel para...
详细信息
Large-scale computational problems are encountered when one attempts to realize high degree of detail and realism in the simulation of quantum transport in nanodevices. These problems can be addressed using novel parallelalgorithms that are ideally suited for high-end computing platforms. This article has two objectives: (i) the description of the transport model and the associated computational challenges within the multidimensional finite element simulator NESSIE, and (ii) the presentation of a new strategy for handling the transport problem and solving the banded linear systems that arise from the Green (or wave) function approach.
The development of new simulation tools is critical for the exploration of quantum transport in nanoscale devices. Such simulation is commonly performed by solving self-consistently the transport problem using the Non...
详细信息
ISBN:
(纸本)0972842284
The development of new simulation tools is critical for the exploration of quantum transport in nanoscale devices. Such simulation is commonly performed by solving self-consistently the transport problem using the Non-Equilibrium Green's Functions (NEGF) formalism and the Poisson's equation to account for the space charge e ects. The quest for ever higher levels of detail and realism in such simulations as the modeling of multidimensional devices with detailed band structure calculations with(or without) the inclusion of scattering e ects, requires huge computational e ort. Hence, the need for an active research e ort in developing novel numerical techniques and parallelalgorithms that axe ideally suited for high-end computing platforms. In this article, we will identify the identify the challenging numerical problems which arise from the NEGF/Poisson procedure and we will present new efficient parallel schemes for computing the problem.
The relaxed Burnett system, recently introduced in as a hydrodynamical approximation of the Boltzmann equation, is numerically solved. Due to the stiffness of this system and the severe CFL condition for large Mach nu...
详细信息
The relaxed Burnett system, recently introduced in as a hydrodynamical approximation of the Boltzmann equation, is numerically solved. Due to the stiffness of this system and the severe CFL condition for large Mach numbers, a fully implicit Runge-Kutta method has been used. In order to reduce computing time, we apply a parallel stiff ODE solver based on 4-stage Radau IIA IRK. The ODE solver is combined with suitable first order upwind and second order MUSCL relaxation schemes for the spatial derivatives. Speedup results and comparisons to DSMC and Navier-Stokes approximations are reported for a 1D shock profile.
暂无评论