检索结果-内蒙古大学图书馆

1st International Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Nakajima, Kengo Univ Tokyo Informat Technol Ctr RIKEN R CCS Tokyo Japan

ISBN: (纸本)9798350364613;9798350364606

Preconditioned iterative methods based on the Krylov subspace technique are widely employed in various scientific and technical computing. When utilizing large-scale parallel computing systems, the communication overhead tends to increase with the growth in the number of nodes, making its reduction a crucial challenge. In parallel finite element methods (FEM) and finite volume methods (FVM), halo communication and computation overlapping (CC-overlapping) are commonly employed, often in conjunction with the dynamic loop scheduling feature of OpenMP. This approach has been primarily applied to sparse matrix-vector products (SpMV) and explicit solvers. Previous studies by the author have proposed reordering techniques for applying CC-overlapping to processes involving global data dependencies, such as the Conjugate Gradient method preconditioned by Incomplete Cholesky Factorization (ICCG). Successful implementations on massively parallel supercomputers demonstrated high parallel performance, but the application of CC-overlapping was limited to SpMV. In the present work, the author proposes a method to apply CC-overlapping to the forward and backward substitutions of the IC(0) smoother of the parallel Conjugate Gradient method preconditioned by Multigrid (MGCG). Using up to 4,096 nodes on Wisteria/EMEC-01 (Odyssey) with A64FX, performance improvement of approximately 40+% was achieved compared to the original implementation, while improvement was 20+% on 1,024 nodes of Oakbridge-CX system with Intel Xeon CPU's.

关键词： Parallel Computing Iterative Solvers Multigrid Manycore communication-computation overlapping

来源：评论

学校读者我要写书评

暂无评论

communication-computation overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore and Manycore Clusters 46

Communication-Computation Overlapping with Dynamic Loop Sche...

引用

46th International Conference on Parallel Processing Workshops (ICPPW)

作者： Nakajima, Kengo Hanawa, Toshihiro Univ Tokyo Informat Technol Ctr Bunkyo Ku 2-11-16 Yayoi Tokyo 1138658 Japan

ISBN: (纸本)9781538610442

Preconditioned parallel solvers based on the Krylov iterative method are widely used in scientific and engineering applications. communication overhead is a critical issue when executing these solvers on large-scale massively parallel supercomputers. In this work, we introduced communication-computation (CC) overlapping with dynamic loop scheduling of OpenMP to the sparse matrix-vector multiplication (SpMV) process of a parallel iterative solver. We then used the solver to evaluate the performance of a parallel finite element application (GeoFEM/Cube) on multicore and manycore clusters. The dynamic loop scheduling of OpenMP improved the efficiency of CC overlapping in halo exchanges, and the developed method attained a significant performance improvement of 40-50% for parallel iterative solvers in strong scaling using up to 16,384 cores of a Fujitsu PRIMEHPC FX10 supercomputer and an Intel Xeon Phi (KNL) cluster. Finally, the developed method was applied to GeoFEM/Cube using a parallel BiCGSTAB solver with sparse approximate inverse (SAI) preconditioning, and a 15-20% performance improvement was obtained on 12,288 cores of the Fujitsu FX10 and the KNL cluster.

关键词： parallel computing iterative solvers hybrid parallel programming model communication-computation overlapping dynamic loop scheduling

来源：评论

学校读者我要写书评

暂无评论

An Overlap study for Cluster Computing

An Overlap study for Cluster Computing

引用

International Conference on computational Science and computational Intelligence (CSCI)

作者： Colmenares, Eduardo Andersen, Per Wei, Bingyang Midwestern State Univ Dept Comp Sci Wichita Falls TX 76308 USA Texas Tech Univ HPCC Lubbock TX USA

ISBN: (纸本)9781467397957

Distributed memory systems (DMS) (clusters) are one of the tools being used by researchers to solve a wide spectrum of computational intensive problems in a fraction of the time of a sequential approach. The nature of a DMS does not enforce intense data sharing among computational nodes, this will occur if the problem under analysis happens to be data dependent in nature. The latency associated with dynamic data sharing in a DMS is well known to increase the total execution time. One of the possible techniques that can be used to reduce the negative effects associated with this latency is overlapping. In this paper we show why a characterization of the overlapping capabilities of a cluster is important to justify results.

关键词： cluster communication-computation overlapping synchronization-computation overlapping latency data-dependent

来源：评论

学校读者我要写书评

暂无评论

Conjugate Gradients on multiple GPUs

引用

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS 2010年第10-12期64卷 1254-1273页

作者： Georgescu, Serban Okuda, Hiroshi Univ Tokyo Dept Quantum Engn & Syst Sci Bunkyo Ku Tokyo 1138654 Japan Univ Tokyo RACE Chiba 2778568 Japan

A GPU-accelerated Conjugate Gradient solver is tested on eight different matrices with different structural and numerical characteristics. The first four matrices are obtained by discretizing the 3D Poisson's equation, which arises in many fields such as computational fluid dynamics, heat transfer and so on. Their relatively low bandwidth and low condition numbers makes them ideal targets for GPU acceleration. We chose another four matrices from the other end of the spectrum, both ill-conditioned and with very large bandwidth. This paper concentrates on the computational aspects related to running the solver on multiple GPUs. We develop a fast distributed sparse-matrix vector multiplication routine using optimized data formats that allows the overlapping of communication with computation and, at the same time, the sharing of some of the work with the CPU. By a thorough analysis of the time spent in communication and computation, we show that the proposed overlapped implementation outperforms the non-overlapped one by a large margin and provides almost perfect strong scalability for large Poisson-type matrices. We then benchmark the performance of the entire solver, using both double precision and single precision combined with iterative refinement and report up to 22x acceleration when using three GPUs as compared with one of the most powerful Intel Nehalem CPUs available today. Finally, we show that using GPUs as accelerators not only brings an order of magnitude speedup but also up to 5x increase in power efficiency and over 10x increase in cost effectiveness. Copyright (C) 2010 John Wiley & Sons, Ltd.

关键词： GPGPU Conjugate Gradients communication-computation overlapping Poisson's equation mixed precision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：