检索结果-内蒙古大学图书馆

parallel SUBSPACE METHOD FOR NON-HERMITIAN EIGENPROBLEMS ON THE CONNECTION MACHINE (CM2)

APPLIED NUMERICAL MATHEMATICS 1992年第1期10卷 19-35页

作者： PETITON, SG ETAB TECH CENT ARMEMENT SITE EXPTL HYPERPARALLELISMEF-94114 ARCUEILFRANCE

In this paper we present a parallel implementation of Arnoldi's subspace method on the Connection Machine. With a 16K-processor CM2, we obtained performances of a few hundred Megaflops for a matrix size of several thousands when computing a small number of eigenvalues and eigenvectors. The extrapolated performance on a 64K-processor CM2 indicates that the asymptotic speed will be greater than 1 Gigaflop for very large matrices. We show that it is possible to use the subspace method with a good throughput or speed-up on massively-parallel architectures like the CM2. We remark that other classical methods for linear algebra problems such as, for example, backsubstitution and thc QR method, cannot exploit all the potential power of massively-parallel machines. Next, we propose using the subspace method as a programming methodology for massively-parallel machines in order to obtain a good performance when solving some large linear algebra problems, especially eigenproblems. This method is the most frequently one used for very large cigenproblems and it is also well adapted to massively-parallel architectures. The choice of the subspace size is verv important both for numerical stability and speed-up. We conclude with a discussion on the effects of the orders chosen for the subspaces on performance.

关键词： LINEAR ALGEBRA NON-HERMITIAN EIGENPROBLEMS MATRIX COMPUTATION ITERATIVE SUBSPACE METHODS massively-parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms

引用

parallel COMPUTING 2022年 111卷

作者： Schade, Robert Kenter, Tobias Elgabarty, Hossam Lass, Michael Schuett, Ole Lazzaro, Alfio Pabst, Hans Mohr, Stephan Hutter, Juerg Kuehne, Thomas D. Plessl, Christian Paderborn Univ Paderborn Ctr Parallel Comp Warburger Str 100 D-33098 Paderborn Germany Paderborn Univ Dept Comp Sci Warburger Str 100 D-33098 Paderborn Germany Paderborn Univ Dept Chem Warburger Str 100 D-33098 Paderborn Germany Swiss Fed Inst Technol Dept Mat CH-8092 Zurich Switzerland HPE Switzerland GmbH Basel Switzerland Intel Extreme Comp Software & Syst Zurich Switzerland Nextmol Bytelab Solut SL Barcelona Spain Barcelona Supercomp Ctr BSC Barcelona Spain Univ Zurich Dept Chem Zurich Switzerland

We push the boundaries of electronic structure-based ab-initio molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise barely reachable with classical force-field methods or novel neural network and machine learning potentials. We achieve this breakthrough by combining innovations in linear-scaling AIMD, efficient and approximate sparse linear algebra, low and mixed-precision floating-point computation on GPUs, and a compensation scheme for the errors introduced by numerical approximations. The core of our work is the non-orthogonalized local submatrix method (NOLSM), which scales very favorably to massively parallel computing systems and translates large sparse matrix operations into highly parallel, dense matrix operations that are ideally suited to hardware accelerators. We demonstrate that the NOLSM method, which is at the center point of each AIMD step, is able to achieve a sustained performance of 324 PFLOP/s in mixed FP16/FP32 precision corresponding to an efficiency of 67.7% when running on 1536 NVIDIA A100 GPUs.

关键词： Supercomputing High-performance computing massively-parallel algorithms Large-scale linear algebra Ab-initio molecular dynamics Approximate computing

来源：评论

学校读者我要写书评

暂无评论

GPU efficient 1D and 3D recursive filtering

引用

DIGITAL SIGNAL PROCESSING 2021年 114卷 103076-103076页

作者： Maximo, Andre Univ Fed Rio de Janeiro Comp Sci & Syst Engn Rio De Janeiro Brazil

This work presents strategies to massively parallelize recursive filters on inputs of one dimension (1D) or three dimensions (3D), complementing and improving on previous state-of-the-art algorithms on two dimensions (2D). Each strategy is reusable on different algorithms for parallel processing with feedback data dependencies, allowing to develop highly optimized algorithms for computing digital filters in general, with double-pass causal-anticausal feedbacks, in one or multiple dimensions. The algorithms are linear in time and memory, exposes a high number of parallel tasks, and they are implemented on graphics processing units, i.e. GPUs. One major barrier in this area is to have such algorithms faster than generic counterparts in available libraries, and another is to have them in an easy-to-use manner. To overcome the latter, the implementation of the presented strategies is available as open source, and, to overcome the former, timing performance and comparison results are provided, including a range of publicly available source codes and libraries, showing that this work outperforms fastest prior algorithms. (C) 2021 Elsevier Inc. All rights reserved.

关键词： massively-parallel algorithms Recursive filtering GPU-based algorithms Performance optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：