We apply dimensional analysis to a formula for execution time for a QR algorithm from a paper by Henry and van de Geijn. We define a single efficiency surface that reduces performance analysis for this algorithm to an...
详细信息
ISBN:
(纸本)9783540681052
We apply dimensional analysis to a formula for execution time for a QR algorithm from a paper by Henry and van de Geijn. We define a single efficiency surface that reduces performance analysis for this algorithm to an exercise in differential geometry. As the problem size and the number of processors change, different machines move along different paths on the surface determined by two computational forces specific to each machine. We show that computational force, also called computational intensity, is a unifying concept for understanding the performance of parallel numerical algorithms.
The development of new simulation tools is critical for the exploration of quantum transport in nanoscale devices. Such simulation is commonly performed by solving self-consistently the transport problem using the Non...
详细信息
ISBN:
(纸本)0972842284
The development of new simulation tools is critical for the exploration of quantum transport in nanoscale devices. Such simulation is commonly performed by solving self-consistently the transport problem using the Non-Equilibrium Green's Functions (NEGF) formalism and the Poisson's equation to account for the space charge e ects. The quest for ever higher levels of detail and realism in such simulations as the modeling of multidimensional devices with detailed band structure calculations with(or without) the inclusion of scattering e ects, requires huge computational e ort. Hence, the need for an active research e ort in developing novel numerical techniques and parallelalgorithms that axe ideally suited for high-end computing platforms. In this article, we will identify the identify the challenging numerical problems which arise from the NEGF/Poisson procedure and we will present new efficient parallel schemes for computing the problem.
In this article, we consider diagonal-implicitly iterated Runge-Kutta (DIIRK) methods for the numerical solution of stiff ordinary differential equations (ODEs) and investigate their performance behavior on a modern c...
详细信息
ISBN:
(纸本)9781538653302
In this article, we consider diagonal-implicitly iterated Runge-Kutta (DIIRK) methods for the numerical solution of stiff ordinary differential equations (ODEs) and investigate their performance behavior on a modern cluster system using MPI. DIIRK methods are implicit methods and require the solution of non-linear equation systems in each iteration step. In particular, we are interested in the parallel execution behavior when using different basis Newton methods for solving the resulting non-linear equation systems of different versions of the DIIRK method. We explore the use of direct solution methods based on LU factorization for the resulting linear equation systems as well as the use of Krylov subspace methods and investigate the resulting performance and accuracy.
In this paper, we explore how numerical calculations can be accelerated by implementing several numerical methods of fractional-order systems using parallel computing techniques. We investigate the feasibility of para...
详细信息
ISBN:
(纸本)9781509057078
In this paper, we explore how numerical calculations can be accelerated by implementing several numerical methods of fractional-order systems using parallel computing techniques. We investigate the feasibility of parallel computing algorithms and their efficiency in reducing the computational costs over a large time interval. Particularly, we present the case of Adams-Bashforth-Mouhlton predictor-corrector method and measure the speedup of two parallel approaches by using GPU and HPC cluster implementations.
The relaxed Burnett system, recently introduced in as a hydrodynamical approximation of the Boltzmann equation, is numerically solved. Due to the stiffness of this system and the severe CFL condition for large Mach nu...
详细信息
The relaxed Burnett system, recently introduced in as a hydrodynamical approximation of the Boltzmann equation, is numerically solved. Due to the stiffness of this system and the severe CFL condition for large Mach numbers, a fully implicit Runge-Kutta method has been used. In order to reduce computing time, we apply a parallel stiff ODE solver based on 4-stage Radau IIA IRK. The ODE solver is combined with suitable first order upwind and second order MUSCL relaxation schemes for the spatial derivatives. Speedup results and comparisons to DSMC and Navier-Stokes approximations are reported for a 1D shock profile.
In this paper we study the implementation of a variant of the classic Gauss-Jordan (GJ) method which was recently introduced by Huard [8] on a shared memoryMIMDcomputer. Two parallel versions are derived by dividing t...
详细信息
In this paper we study the implementation of a variant of the classic Gauss-Jordan (GJ) method which was recently introduced by Huard [8] on a shared memoryMIMDcomputer. Two parallel versions are derived by dividing the sequential Huard method into noninterfering tasks. Taking into consideration the computation as well as the communication complexity we present a parallel scheduling algorithm for each task graph. Next, in an attempt to reduce the communication cost we introduce block versions and follow a similar approach for their study.
Mathematical models with fractional-order differential operators are computationally expensive due to the non-local nature of these operators. In this work, we construct and investigate parallel solvers for problems d...
详细信息
Mathematical models with fractional-order differential operators are computationally expensive due to the non-local nature of these operators. In this work, we construct and investigate parallel solvers for problems described by fractional powers of elliptic operators, like fractional diffusion. Three state-of-the-art approaches are used to transform the non-local fractional-order differential problem into local partial differential equation problems formulated in a space of higher dimension. numerical schemes and parallelalgorithms are developed for all three approaches. The resulting parallelalgorithms have very different properties. We investigate the weak and strong scalability of the developed parallelalgorithms and compare their parallel performance.
In this paper, we develop and investigate the parallel numerical algorithms for three different state-of-the-art numerical methods for solving the non-local problems described by fractional powers of elliptic operator...
详细信息
In this paper, we develop and investigate the parallel numerical algorithms for three different state-of-the-art numerical methods for solving the non-local problems described by fractional powers of elliptic operators. These methods transform the non-local problem into some local differential problems of elliptic or parabolic type. A two-level parallelization approach is applied to construct the efficient parallelalgorithms using the domain decomposition and master-slave methods, to deal with the increase in computational complexity. We show and compare the serial and parallel solution times that are required to achieve similar accuracy of the solution using different algorithms. Results of extensive convergence tests are presented solving a three-dimensional test problem with known decrease of the solution's convergence rate depending on the fractional power coefficient. We analyze and discuss the non-trivial question, which parallel algorithm is recommended to achieve certain accuracy for the given fractional power coefficient.
An algorithm is described based on Newton's method which simultaneously approximates all zeros of a polynomial with only real zeros. The algorithm, which is conceptually suitable for parallel computation, determin...
详细信息
暂无评论