检索结果-内蒙古大学图书馆

作者： Kadir Akbudak Bilkent University

学位级别：博士

Multiplication of two sparse matrices (i. e., sparse matrix-matrix multiplica- tion, which is abbreviated as SpGEMM) is a widely used kernel in many ap- plications such as molecular dynamics simulations, graph operations, and linear programming. We identify parallel formulations of SpGEMM operation in the form of C = AB for distributed-memory architectures. Using these formula- tions, we propose parallel SpGEMM algorithms that have the multiplication and communication phases: The multiplication phase consists of local SpGEMM com- putations without any communication and the communication phase consists of transferring required input/output matrices. For these algorithms, three hyper- graph models are proposed. These models are used to partition input and output matrices simultaneously. The input matrices A and B are partitioned in one di- mension in all of these hypergraph models. The output matrix C is partitioned in two dimensions, which is nonzero-based in the first hypergraph model, and it is partitioned in one dimension in the second and third models. In partitioning of these hypergraph models, the constraint on vertex weights corresponds to com- putational load balancing among processors for the multiplication phase of the proposed SpGEMM algorithms, and the objective, which is minimizing cutsize defined in terms of costs of the cut hyperedges, corresponds to minimizing the communication volume due to transferring required matrix entries in the commu- nication phase of the SpGEMM algorithms. We also propose models for reducing the total number of messages while maintaining balance on communication vol- umes handled by processors during the communication phase of the SpGEMM algorithms. An SpGEMM library for distributed memory architectures is devel- oped in order to verify the empirical validity of our models. The library uses MPI (Message Passing Interface) for performing communication in the parallel setting. The developed SpGEMM library is run on SpGEMM insta

关键词： sparse matrices matrix partitioning parallel computing distributed memory parallelism generalized matrix multiplication GEMM sparse matrix-matrix multiplication SpGEMM computational hypergraph model hypergraph partitioning BLAS (Basic Linear Algebra Subprograms) Level 3 operations molecular dynamics simulations graph operations linear programming

来源：评论

学校读者我要写书评

暂无评论

Task-based Parallel Programming for Scalable Matrix Product Algorithms

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2023年第2期49卷 15-15页

作者： Agullo, Emmanuel Buttari, Alfredo Guermouche, Abdou Herrmann, Julien Jego, Antoine Inria LaBRI 200 Vieille Tour F-33405 Talence France IRIT ENSEEIHT 2 Rue Charles Camichel F-31071 Toulouse France

Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way. In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.

关键词： Parallel programming models scalable linear algebra algorithms runtime systems distributed memory parallelism sequential task flow

来源：评论

学校读者我要写书评

暂无评论

Axially-deformed solution of the Skyrme-Hartree-Fock-Bogoliubov equations using the transformed harmonic oscillator basis (IV) HFBTHO (v4.0): A new version of the program

引用

COMPUTER PHYSICS COMMUNICATIONS 2022年 276卷

作者： Marevic, P. Schunck, N. Ney, E. M. Perez, R. Navarro Verriere, M. O'Neal, J. Lawrence Livermore Natl Lab Nucl & Chem Sci Div Livermore CA 94551 USA Univ Zagreb Fac Sci Dept Phys HR-10000 Zagreb Croatia Univ N Carolina Dept Phys & Astron CB 3255 Chapel Hill NC 27599 USA San Diego State Univ Dept Phys 5500 Campanile Dr San Diego CA 92182 USA Argonne Natl Lab Math & Comp Sci Div Lemont IL 60439 USA

We describe the new version 4.0 of the code HFBTHO that solves the nuclear Hartree-Fock-Bogoliubov problem by using the deformed harmonic oscillator basis in cylindrical coordinates. In the new version, we have implemented the restoration of rotational, particle number, and reflection symmetry for even even nuclei. The restoration of rotational symmetry does not require using bases closed under rotation. Furthermore, we added the SeaLL1 functional and improved the calculation of the Coulomb potential. Finally, we refactored the code to facilitate maintenance and future *** version program summaryProgram title: HFBTHO v4.0CPC Library link to program files: https://doi .org /10 .17632 /c5g2f92by3 .2Code Ocean capsule: https://codeocean .com /capsule /5389629Licensing provisions: GPLv3Programming language: Fortran 2003Journal reference of previous version: R.N. Perez, N. Schunck, R.-D. Lasseri, C. Zhang and J. Sarich, Comput. Phys. Commun. 220 (2017) 363Does the new version supersede the previous version: YesReasons for the new version: This version adds new capabilities to restore broken symmetries and determine corresponding quantum numbers of even-even nucleiSummary of revisions:1. Angular momentum projection for even-even nuclei in a deformed basis;2. Particle number projection for even-even nuclei in the quasiparticle basis;3. Implementation of the SeaLL1 functional;4. Expansion of the Coulomb potential onto Gaussians;5. MPI-parallelization of a single HFBTHO execution;6. Code *** of problem: HFBTHO is a physics computer code that is used to model the structure of the nucleus. It is an implementation of the energy density functional (EDF) approach to atomic nuclei, where the energy of the nucleus is obtained by integration over space of some phenomenological energy density, which is itself a functional of the neutron and proton intrinsic densities. In the present version of HFBTHO, the energy density is derived either from the zero-rang

关键词： Energy density functional theory Self-consistent mean field Hartree-Fock-Bogoliubov theory Harmonic oscillator Restoration of symmetries Angular momentum projection Particle number projection distributed memory parallelism

来源：评论

学校读者我要写书评

暂无评论

MPI plus X: task-based parallelisation and dynamic load balance of finite element assembly

引用

INTERNATIONAL JOURNAL OF COMPUTATIONAL FLUID DYNAMICS 2019年第3期33卷 115-136页

作者： Garcia-Gasulla, Marta Houzeaux, Guillaume Ferrer, Roger Artigues, Antoni Lopez, Victor Labarta, Jesus Vazquez, Mariano Barcelona Supercomp Ctr Barcelona Spain

The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. The matrix assembly consists of a loop over the elements, faces, edges or nodes of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP, with different techniques to avoid the race condition, but presenting efficiency or implementation drawbacks. We propose an alternative, based on task parallelism using some extensions to the OpenMP programming model. In addition, dynamic load balance will be applied, especially efficient in the presence of hybrid meshes. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores.

关键词： CFD finite element MPI plus X MPI OpenMP dynamic load balance shared-memory parallelism distributed memory parallelism hybrid parallelism

来源：评论

学校读者我要写书评

暂无评论

MPI+X: task-based parallelisation and dynamic load balance of finite element assembly

引用

International Journal of Computational Fluid Dynamics 2019年第3期33卷 115页

作者： Garcia-Gasulla, Marta Houzeaux, Guillaume Ferrer, Roger Artigues, Antoni López, Victor Labarta, Jesús Vázquez, Mariano Barcelona Supercomputing Center Barcelona Spain

关键词： CFD finite element MPI+X MPI OpenMP dynamic load balance shared-memory parallelism distributed memory parallelism hybrid parallelism

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：