Multiplication of two sparse matrices (i. e., sparse matrix-matrix multiplica- tion, which is abbreviated as SpGEMM) is a widely used kernel in many ap- plications such as molecular dynamics simulations, graph operati...
详细信息
Multiplication of two sparse matrices (i. e., sparse matrix-matrix multiplica- tion, which is abbreviated as SpGEMM) is a widely used kernel in many ap- plications such as molecular dynamics simulations, graph operations, and linear programming. We identify parallel formulations of SpGEMM operation in the form of C = AB for distributed-memory architectures. Using these formula- tions, we propose parallel SpGEMM algorithms that have the multiplication and communication phases: The multiplication phase consists of local SpGEMM com- putations without any communication and the communication phase consists of transferring required input/output matrices. For these algorithms, three hyper- graph models are proposed. These models are used to partition input and output matrices simultaneously. The input matrices A and B are partitioned in one di- mension in all of these hypergraph models. The output matrix C is partitioned in two dimensions, which is nonzero-based in the first hypergraph model, and it is partitioned in one dimension in the second and third models. In partitioning of these hypergraph models, the constraint on vertex weights corresponds to com- putational load balancing among processors for the multiplication phase of the proposed SpGEMM algorithms, and the objective, which is minimizing cutsize defined in terms of costs of the cut hyperedges, corresponds to minimizing the communication volume due to transferring required matrix entries in the commu- nication phase of the SpGEMM algorithms. We also propose models for reducing the total number of messages while maintaining balance on communication vol- umes handled by processors during the communication phase of the SpGEMM algorithms. An SpGEMM library for distributedmemory architectures is devel- oped in order to verify the empirical validity of our models. The library uses MPI (Message Passing Interface) for performing communication in the parallel setting. The developed SpGEMM library is run on SpGEMM insta
Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory pa...
详细信息
Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way. In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.
We describe the new version 4.0 of the code HFBTHO that solves the nuclear Hartree-Fock-Bogoliubov problem by using the deformed harmonic oscillator basis in cylindrical coordinates. In the new version, we have implem...
详细信息
We describe the new version 4.0 of the code HFBTHO that solves the nuclear Hartree-Fock-Bogoliubov problem by using the deformed harmonic oscillator basis in cylindrical coordinates. In the new version, we have implemented the restoration of rotational, particle number, and reflection symmetry for even even nuclei. The restoration of rotational symmetry does not require using bases closed under rotation. Furthermore, we added the SeaLL1 functional and improved the calculation of the Coulomb potential. Finally, we refactored the code to facilitate maintenance and future *** version program summaryProgram title: HFBTHO v4.0CPC Library link to program files: https://doi .org /10 .17632 /c5g2f92by3 .2Code Ocean capsule: https://codeocean .com /capsule /5389629Licensing provisions: GPLv3Programming language: Fortran 2003Journal reference of previous version: R.N. Perez, N. Schunck, R.-D. Lasseri, C. Zhang and J. Sarich, Comput. Phys. Commun. 220 (2017) 363Does the new version supersede the previous version: YesReasons for the new version: This version adds new capabilities to restore broken symmetries and determine corresponding quantum numbers of even-even nucleiSummary of revisions:1. Angular momentum projection for even-even nuclei in a deformed basis;2. Particle number projection for even-even nuclei in the quasiparticle basis;3. Implementation of the SeaLL1 functional;4. Expansion of the Coulomb potential onto Gaussians;5. MPI-parallelization of a single HFBTHO execution;6. Code *** of problem: HFBTHO is a physics computer code that is used to model the structure of the nucleus. It is an implementation of the energy density functional (EDF) approach to atomic nuclei, where the energy of the nucleus is obtained by integration over space of some phenomenological energy density, which is itself a functional of the neutron and proton intrinsic densities. In the present version of HFBTHO, the energy density is derived either from the zero-rang
The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI...
详细信息
The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. The matrix assembly consists of a loop over the elements, faces, edges or nodes of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP, with different techniques to avoid the race condition, but presenting efficiency or implementation drawbacks. We propose an alternative, based on task parallelism using some extensions to the OpenMP programming model. In addition, dynamic load balance will be applied, especially efficient in the presence of hybrid meshes. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores.
The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI...
详细信息
The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. The matrix assembly consists of a loop over the elements, faces, edges or nodes of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP, with different techniques to avoid the race condition, but presenting efficiency or implementation drawbacks. We propose an alternative, based on task parallelism using some extensions to the OpenMP programming model. In addition, dynamic load balance will be applied, especially efficient in the presence of hybrid meshes. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores.
暂无评论