We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve larg...
详细信息
We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting Nvidia Graphics Processing Unit (GPU) accelerators. The work extends previous efforts of some of the authors in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the GPU kernels. Strong and weak scalability results of the new version of the library on well-known benchmark test cases are discussed. Comparisons with the Nvidia AmgX solution show a speedup, in the solve phase, up to 2.0x.
We propose a hybrid sparse linear system solver based on M-matrix splitting and block-row projection (BRP). We split the sparse coefficient matrix A into two (nonsingular) M-matrices, and construct an augmented larger...
详细信息
We propose a hybrid sparse linear system solver based on M-matrix splitting and block-row projection (BRP). We split the sparse coefficient matrix A into two (nonsingular) M-matrices, and construct an augmented larger linear system which we solve using a BRP method. The robustness of BRP is compared with those of ILUT-preconditioned GMRES, and the sparse direct solver Pardiso. We also demonstrate the parallel scalability of BRP on a cluster of multicore nodes. (C) 2017 Elsevier B.V. All rights reserved.
In this paper, we construct and investigate parallel solvers for three dimensional problems described by fractional powers of elliptic operators. The main aim is to make a scalability analysis of parallel versions of ...
详细信息
ISBN:
(纸本)9783319780245;9783319780238
In this paper, we construct and investigate parallel solvers for three dimensional problems described by fractional powers of elliptic operators. The main aim is to make a scalability analysis of parallel versions of several state of the art solvers. The originality of this work is that we also consider the accuracy of the selected numericalalgorithms. For comparison of accuracy, we use solutions obtained solving the test problem by the Fourier algorithm. Such analysis enables to compare the efficiency of the proposed parallelalgorithms depending on the required accuracy of solution and on a number of processes used in computations.
In this paper, we develop and investigate the parallel numerical algorithms for three different state-of-the-art numerical methods for solving the non-local problems described by fractional powers of elliptic operator...
详细信息
In this paper, we develop and investigate the parallel numerical algorithms for three different state-of-the-art numerical methods for solving the non-local problems described by fractional powers of elliptic operators. These methods transform the non-local problem into some local differential problems of elliptic or parabolic type. A two-level parallelization approach is applied to construct the efficient parallelalgorithms using the domain decomposition and master-slave methods, to deal with the increase in computational complexity. We show and compare the serial and parallel solution times that are required to achieve similar accuracy of the solution using different algorithms. Results of extensive convergence tests are presented solving a three-dimensional test problem with known decrease of the solution's convergence rate depending on the fractional power coefficient. We analyze and discuss the non-trivial question, which parallel algorithm is recommended to achieve certain accuracy for the given fractional power coefficient.
In this article, we consider diagonal-implicitly iterated Runge-Kutta (DIIRK) methods for the numerical solution of stiff ordinary differential equations (ODEs) and investigate their performance behavior on a modern c...
详细信息
ISBN:
(纸本)9781538653302
In this article, we consider diagonal-implicitly iterated Runge-Kutta (DIIRK) methods for the numerical solution of stiff ordinary differential equations (ODEs) and investigate their performance behavior on a modern cluster system using MPI. DIIRK methods are implicit methods and require the solution of non-linear equation systems in each iteration step. In particular, we are interested in the parallel execution behavior when using different basis Newton methods for solving the resulting non-linear equation systems of different versions of the DIIRK method. We explore the use of direct solution methods based on LU factorization for the resulting linear equation systems as well as the use of Krylov subspace methods and investigate the resulting performance and accuracy.
This report focuses on technology of supercomputer simulation of nonlinear processes in the cores, extracted from oil and gas production wells in order to study the properties of hydrocarbon reservoirs. One of modern ...
详细信息
This report focuses on technology of supercomputer simulation of nonlinear processes in the cores, extracted from oil and gas production wells in order to study the properties of hydrocarbon reservoirs. One of modern approaches to solving these kind problems is to create multiphysical mathematical model of core for its study by computer methods. This approach minimizes the number of natural experiments and predicts the evolution of layers properties. Also it allows to predict oil and gas recovery of layers for a long time period. However, implementation of this technology called "virtual core" requires the following: 1) to create multiparametrical model of core as close as possible to the reality;2) to include the multicomponent and multiphase composition and complex real geometry of core in consideration;3) to develop a computational framework for modeling the seepage of multicomponent liquid and gas mixtures through the core;4) to carry out large-scale calibration calculations. In this paper, an attempt to create such a multifactor mathematical model and computational foundations for its computing and supercomputing analysis is made.
In this paper we describe a parallel algorithm for solving large sparse nonsingular linear systems Ax = f, of order n, using the Hermitian Skew-Hermitian splitting approach for handling the augmented linear system, of...
详细信息
In this paper we describe a parallel algorithm for solving large sparse nonsingular linear systems Ax = f, of order n, using the Hermitian Skew-Hermitian splitting approach for handling the augmented linear system, of order 2n, that arises from the linear least problem of minimizing the 2-norm of (f-Ax). We use the restarted GMRES as the outer iteration with the Hermitian Skew-Hermitian Splitting (HSS) preconditioner. In solving systems involving this preconditioner, the most time consuming part deals with handling shifted skew-symmetric systems. We solve such systems using the successive overrelaxation (SOR). Theoretical analysis shows that our solver always converges to the unique solution of Ax = f. We present several numerical experiments that demonstrate the robustness of our solver compared to other schemes, and show its parallel scalability on a single multicore node. (C) 2016 Elsevier Ltd. All rights reserved.
Mathematical models with fractional-order differential operators are computationally expensive due to the non-local nature of these operators. In this work, we construct and investigate parallel solvers for problems d...
详细信息
Mathematical models with fractional-order differential operators are computationally expensive due to the non-local nature of these operators. In this work, we construct and investigate parallel solvers for problems described by fractional powers of elliptic operators, like fractional diffusion. Three state-of-the-art approaches are used to transform the non-local fractional-order differential problem into local partial differential equation problems formulated in a space of higher dimension. numerical schemes and parallelalgorithms are developed for all three approaches. The resulting parallelalgorithms have very different properties. We investigate the weak and strong scalability of the developed parallelalgorithms and compare their parallel performance.
In this paper, we explore how numerical calculations can be accelerated by implementing several numerical methods of fractional-order systems using parallel computing techniques. We investigate the feasibility of para...
详细信息
ISBN:
(纸本)9781509057078
In this paper, we explore how numerical calculations can be accelerated by implementing several numerical methods of fractional-order systems using parallel computing techniques. We investigate the feasibility of parallel computing algorithms and their efficiency in reducing the computational costs over a large time interval. Particularly, we present the case of Adams-Bashforth-Mouhlton predictor-corrector method and measure the speedup of two parallel approaches by using GPU and HPC cluster implementations.
Multicore CPUs can be combined with GPUs to perform computations over 3D unstructured meshes on heterogeneous CPU-GPU clusters. The authors explain how to unlock the CPUs' computing power without slowing down othe...
详细信息
Multicore CPUs can be combined with GPUs to perform computations over 3D unstructured meshes on heterogeneous CPU-GPU clusters. The authors explain how to unlock the CPUs' computing power without slowing down other tasks related to data movement. By solving the representative diffusion equation using the cell-centered finite volume method, the authors demonstrate that combining the computing capacity of CPUs and GPUs delivers a performance advantage over the GPU-only approach.
暂无评论