检索结果-内蒙古大学图书馆

Fault tolerant algorithms for heat transfer problems

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2008年第5期68卷 663-677页

作者： Ltaief, Hatem Gabriel, Edgar Garbey, Marc Univ Houston Dept Comp Sci Houston TX 77204 USA

With the emergence of new massively parallel systems in the high performance computing area allowing scientific simulations to run on thousands of processors, the mean time between failures of large machines is decreasing from several weeks to a few minutes. The ability of hardware and software components to handle these singular events called process failures is therefore getting increasingly important. In order for a scientific code to continue despite a process failure, the application must be able to retrieve the lost data items. The recovery procedure after failures might be fairly straightforward for elliptic and linear hyperbolic problems. However, the reversibility in time for parabolic problems appears to be the most challenging part because it is an ill-posed problem. This paper focuses on new fault-tolerant numerical schemes for the time integration of parabolic problems. The new algorithm allows the application to recover from process failures and to reconstruct numerically the lost data of the failed process(es) avoiding the expensive roll-back operation required in most checkpoin/restart schemes. As a fault tolerant communication library, we use the fault tolerant message passing interface developed by the Innovative Computing Laboratory at the University of Tennessee. Experimental results show promising performances. Indeed, the three-dimensional parabolic benchmark code is able to recover and to keep on running after failures, adding only a very small penalty to the overall time of execution. (C) 2007 Elsevier Inc. All rights reserved.

关键词： parallel numerical algorithms process fault tolerance parabolic problems

来源：评论

学校读者我要写书评

暂无评论

Algorithm-based fault location and recovery for matrix computations on multiprocessor systems

引用

IEEE TRANSACTIONS ON COMPUTERS 1996年第11期45卷 1239-1247页

作者： RoyChowdhury, A Banerjee, P NORTHWESTERN UNIV CTR PARALLEL & DISTRIBUTED COMPEVANSTONIL 60208 NORTHWESTERN UNIV DEPT ELECT & COMP ENGNEVANSTONIL 60208

Algorithm-based fault-tolerance (ABFT) is an inexpensive method of incorporating fault-tolerance into existing applications. Applications are modified to operate on encoded data and produce encoded results which may then be checked for correctness. An attractive feature of the scheme is that it requires little or no modification to the underlying hardware or system software. Previous algorithm-based methods for developing reliable versions of numerical programs for general-purpose multicomputers have mostly concerned themselves with error detection. A truly fault-tolerant algorithm, however, needs to locate errors and recover from them once they are located. In a parallel processing environment, this corresponds to locating the faulty processors and recovering the data corrupted by the faulty processors. In this paper, we first present a general scheme for performing fault-location and recovery under the ABFT framework. Our fault model assumes that a faulty processor can corrupt all the data it possesses. The fault-location scheme is an application of system-level diagnosis theory to the ABFT framework, while the fault-recovery scheme uses ideas from coding theory to maintain redundant data and uses this to recover corrupted data in the event of processor failures. Results are presented on implementations of three numerical algorithms on a 16-processor Intel iPSC/2 hypercube multicomputer, which demonstrate acceptably low overheads for the single and double fault location and recovery cases.

关键词： algorithm-based fault-tolerance parallel numerical algorithms fault location fault recovery system level diagnosis coding theory

来源：评论

学校读者我要写书评

暂无评论

APPLICATION AND ACCURACY OF THE parallel DIAGONAL DOMINANT ALGORITHM

引用

parallel COMPUTING 1995年第8期21卷 1241-1267页

作者： SUN, XH Department of Computer Science Louisiana State University Baton Rouge LA 70803-4020 USA

The parallel Diagonal Dominant (PDD) algorithm is an efficient tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is extended to solve periodic tridiagonal systems and its scalability is studied. Then the reduced PDD algorithm, which has a smaller operation count than that of the conventional sequential algorithm for many applications, is proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric and skew-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the PDD and reduced PDD algorithms are good candidates for emerging massively parallel machines.

关键词： parallel PROCESSING parallel numerical algorithms SCALABLE COMPUTING TRIDIAGONAL SYSTEMS TOEPLITZ SYSTEMS

来源：评论

学校读者我要写书评

暂无评论

A Multi-GPU Aggregation-Based AMG Preconditioner for Iterative Linear Solvers

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2023年第8期34卷 2365-2376页

作者： Bernaschi, Massimo Celestini, Alessandro Vella, Flavio D'Ambra, Pasqua Inst Appl Comp IAC CNR I-00185 Rome Italy Univ Trento I-38122 Trento Italy

We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting Nvidia Graphics Processing Unit (GPU) accelerators. The work extends previous efforts of some of the authors in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the GPU kernels. Strong and weak scalability results of the new version of the library on well-known benchmark test cases are discussed. Comparisons with the Nvidia AmgX solution show a speedup, in the solve phase, up to 2.0x.

关键词： GPU accelerators heterogeneous computing iterative sparse linear solvers parallel numerical algorithms scalability

来源：评论

学校读者我要写书评

暂无评论

GENERALIZED NONLINEAR DIAGONAL DOMINANCE AND APPLICATIONS TO ASYNCHRONOUS ITERATIVE METHODS

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 1991年第1-3期38卷 105-124页

作者： FROMMER, A UNIV KARLSRUHE INST APPL MATHW-7500 KARLSRUHEGERMANY

We introduce a concept of generalized diagonal dominance for nonlinear functions. As in the linear case, this brings together several, apparently different classes of nonlinear functions such as strictly diagonally dominant functions and certain M-functions. With our concept we easily obtain a quite far-reaching result on the global convergence of asynchronous iterative methods for finding zeros of nonlinear functions. Special cases include some known and several new convergence results for special iterative methods such as the nonlinear JOR-, SOR- and SSOR-method.

关键词： DIAGONAL DOMINANCE SYSTEMS OF NONLINEAR EQUATIONS M-FUNCTIONS ASYNCHRONOUS METHODS parallel numerical algorithms

来源：评论

学校读者我要写书评

暂无评论

Block row projection method based on M-matrix splitting

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2018年 340卷 731-744页

作者： Zhang, Zhengyi Sameh, Ahmed H. Purdue Univ Dept Comp Sci 305 N Univ St W Lafayette IN 47907 USA

We propose a hybrid sparse linear system solver based on M-matrix splitting and block-row projection (BRP). We split the sparse coefficient matrix A into two (nonsingular) M-matrices, and construct an augmented larger linear system which we solve using a BRP method. The robustness of BRP is compared with those of ILUT-preconditioned GMRES, and the sparse direct solver Pardiso. We also demonstrate the parallel scalability of BRP on a cluster of multicore nodes. (C) 2017 Elsevier B.V. All rights reserved.

关键词： numerical linear algebra Krylov subspace methods Preconditioners Block row projection M-matrix splitting parallel numerical algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel sparse linear system solver based on Hermitian/skew-Hermitian splitting

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2016年第8期72卷 2000-2007页

作者： Zhang, Zhengyi Sameh, Ahmed H. Purdue Univ Dept Comp Sci 305 N Univ St W Lafayette IN 47907 USA

In this paper we describe a parallel algorithm for solving large sparse nonsingular linear systems Ax = f, of order n, using the Hermitian Skew-Hermitian splitting approach for handling the augmented linear system, of order 2n, that arises from the linear least problem of minimizing the 2-norm of (f-Ax). We use the restarted GMRES as the outer iteration with the Hermitian Skew-Hermitian Splitting (HSS) preconditioner. In solving systems involving this preconditioner, the most time consuming part deals with handling shifted skew-symmetric systems. We solve such systems using the successive overrelaxation (SOR). Theoretical analysis shows that our solver always converges to the unique solution of Ax = f. We present several numerical experiments that demonstrate the robustness of our solver compared to other schemes, and show its parallel scalability on a single multicore node. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： numerical linear algebra Krylov subspace methods Preconditioners Hermitian/Skew-Hermitian splitting parallel numerical algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient deterministic parallel simulation of 2D semiconductor devices based on WENO-Boltzmann schemes

引用

COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING 2009年第5-8期198卷 693-704页

作者： Mantas, Jose M. Caceres, Maria J. Univ Granada Dept Lenguajes & Sistemas Informat ETS Ing Informat & Telecomunicac E-18071 Granada Spain Univ Granada Fac Ciencias Dept Matemat Aplicada Granada 18002 Spain

A flexible parallel deterministic solver of the Boltzmann-Poisson system for 2D semiconductor device simulation on computer clusters is presented. The simulator is obtained by parallelizing a previously proposed numerical scheme based on high order finite difference weighted essentially non-oscillatory (WENO) schemes. Although the underlying numerical scheme presents important advantages over direct simulation Monte Carlo methods, this scheme imposes very high demands of computing power. Due to this, the parallelization of the different calculation phases in the numerical scheme has been tackled. The data subdomain which demands most of the computational workload has been suitably distributed among the processors and several parallel design decisions has been taken in order to achieve good performance. Moreover, the resultant parallel application can be easily adjusted to simulate a wide range of devices and could be easily used by engineers without mathematical background about the underlying numerical scheme. The parallel algorithm has been implemented in C++ augmented with calls to MPI functions and functions of optimized linear algebra libraries. Several experiments have been performed by simulating particular MOSFET and DG-MOSFET devices on a SMP cluster in order to show its efficiency. (C) 2008 Elsevier B.V. All rights reserved.

关键词： Semiconductor simulation parallel numerical algorithms Finite difference weighted essentially non-oscillatory schemes High performance cluster computing

来源：评论

学校读者我要写书评

暂无评论

SCALABLE HETEROGENEOUS CPU-GPU COMPUTATIONS FOR UNSTRUCTURED TETRAHEDRAL MESHES

引用

IEEE MICRO 2015年第4期35卷 6-15页

作者： Langguth, Johannes Sourouri, Mohammed Lines, Glenn Terje Baden, Scott B. Cai, Xing Univ Calif San Diego Dept Comp Sci & Engn San Diego CA 92103 USA

Multicore CPUs can be combined with GPUs to perform computations over 3D unstructured meshes on heterogeneous CPU-GPU clusters. The authors explain how to unlock the CPUs' computing power without slowing down other tasks related to data movement. By solving the representative diffusion equation using the cell-centered finite volume method, the authors demonstrate that combining the computing capacity of CPUs and GPUs delivers a performance advantage over the GPU-only approach.

关键词： Graphics Processing Units parallel Processing Power Aware Computing Scalable Heterogeneous CPU GPU Computations Unstructured Tetrahedral Meshes High Performance Computing Environments Energy Efficient Hardware Accelerators Xeon Phi Coprocessors Predictable Data Access Patterns Instruction Sets Particle Separators Graphics Processing Units Performance Evaluation Mathematical Model Computer Programs Three Dimensional Displays Sparse Linear Algebra Irregular Meshes parallel numerical algorithms Multicore Processors Performance Optimization Emerging Technologies GP Us

来源：评论

学校读者我要写书评

暂无评论

numerical solution of the expanding stellar atmosphere problem

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 1999年第1-2期109卷 41-63页

作者： Hauschildt, PH Baron, E Univ Georgia Dept Phys & Astron Athens GA 30602 USA Univ Georgia Ctr Simulat Phys Athens GA 30602 USA Univ Oklahoma Dept Phys & Astron Norman OK 73019 USA

In this paper we discuss numerical methods and algorithms for the solution of NLTE stellar atmosphere problems involving expanding atmospheres, e.g., found in novae, supernovae and stellar winds. We show how a scheme of nested iterations can be used to reduce the high dimension of the problem to a number of problems with smaller dimensions. As examples of these sub-problems, we discuss the numerical solution of the radiative transfer equation for relativistically expanding media with spherical symmetry, the solution of the multi-level nonLTE statistical equilibrium problem for extremely large model atoms, and our temperature correction procedure. Although modern iteration schemes are very efficient, parallel algorithms are essential in making large-scale calculations feasible, therefore we discuss some parallelization schemes that we have developed. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： stellar atmospheres radiative transfer parallel numerical algorithms NLTE

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：