检索结果-内蒙古大学图书馆

Automatic Tuning of parallel Multigrid Solvers Using OpenMP/MPI hybrid parallel programming models 1

10th International Conference on High Performance Computing for Computational Science (VECPAR)

作者： Nakajima, Kengo Univ Tokyo Ctr Informat Technol Bunkyo Ku Tokyo 1138658 Japan

ISBN: (数字)9783642387180

ISBN: (纸本)9783642387180;9783642387173

The multigrid method with OpenMP/MPI hybrid parallel programming model is expected to play an important role in large-scale scientific computing on post-peta/exa-scale supercomputer systems. Because the multigrid method includes various choices of parameters, selecting the optimum combination of these is a critical issue. In the present work, we focus on the selection of single-threading or multi-threading in the procedures of parallel multigrid solvers using OpenMP/MPI parallel hybrid programming models. We propose a simple empirical method for automatic tuning (AT) of related parameters. The performance of the proposed method is evaluated on the T2K Open Supercomputer (T2K/Tokyo), the Cray XE6, and the Fujitsu FX10 using up to 8,192 cores. The proposed method for AT is effective, and the automatically tuned code provides twice the performance of the original one.

关键词： Multigrid hybrid parallel programming model Automatic Tuning

来源：评论

学校读者我要写书评

暂无评论

Large-scale time-harmonic electromagnetic field analysis using a multigrid solver on a distributed memory parallel computer

引用

parallel COMPUTING 2012年第9期38卷 485-500页

作者： Iwashita, Takeshi Hirotani, Yu Mifune, Takeshi Murayama, Toshio Ohtani, Hideki Kyoto Univ Acad Ctr Comp & Media Studies Kyoto Japan Kyoto Univ Grad Sch Engn Kyoto Japan Sony Corp Mono Zukuri Technol Div Tokyo Japan

This paper reports on an investigation into large-scale parallel time-harmonic electromagnetic field analysis based on the finite element method. The parallel geometric multigrid preconditioned iterative solver for the resulting linear system was developed on a cluster of shared memory parallel computers. We propose a hybrid parallel ordering method for the parallelization of a multiplicative Schwarz smoother, which is a key component of the multigrid solver for electromagnetic field analysis. The method, using domain decomposition ordering for multi-process parallelism and introducing block multi-color ordering for multi-thread parallel processing, attains a high convergence rate with a small number of message passing interface communications and thread synchronizations. The numerical test confirms that the proposed method attains a solver performance more than twice as good as the conventional method based on multi-color ordering. Furthermore, an approximately 800 million degrees of freedom problem is successfully solved on 256 quad-core processors. (c) 2012 Elsevier B.V. All rights reserved.

关键词： Electromagnetic field analysis Finite element method hybrid parallel programming model Multigrid method Multiplicative Schwarz smoother parallel ordering

来源：评论

学校读者我要写书评

暂无评论

An OpenMP-CUDA Implementation of Multilevel Fast Multipole Algorithm for Electromagnetic Simulation on Multi-GPU Computing Systems

引用

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 2013年第7期61卷 3607-3616页

作者： Guan, Jian Yan, Su Jin, Jian-Ming Univ Illinois Ctr Computat Electromagnet Dept Elect & Comp Engn Urbana IL 61801 USA

A multi-GPU implementation of the multilevel fast multipole algorithm (MLFMA) based on the hybrid OpenMP-CUDA parallel programming model (OpenMP-CUDA-MLFMA) is presented for computing electromagnetic scattering of a three-dimensional conducting object. The proposed hierarchical parallelization strategy ensures a high computational throughput for the GPU calculation. The resulting OpenMP-based multi-GPU implementation is capable of solving real-life problems with over one million unknowns with a remarkable speed-up. The radar cross sections of a few benchmark objects are calculated to demonstrate the accuracy of the solution. The results are compared with those from the CPU-based MLFMA and measurements. The capability and efficiency of the presented method are analyzed through the examples of a sphere, an aerocraft, and a missile-like object. Compared with the 8-threaded CPU-based MLFMA, the OpenMP-CUDA-MLFMA method can achieve from 5 to 20 total speed-up ratios.

关键词： CUDA electromagnetic scattering hybrid parallel programming model multi-GPU multilevel fast multipole algorithm OpenMP radar cross section

来源：评论

学校读者我要写书评

暂无评论

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned parallel Iterative Solvers on Multicore and Manycore Clusters 46

Communication-Computation Overlapping with Dynamic Loop Sche...

引用

46th International Conference on parallel Processing Workshops (ICPPW)

作者： Nakajima, Kengo Hanawa, Toshihiro Univ Tokyo Informat Technol Ctr Bunkyo Ku 2-11-16 Yayoi Tokyo 1138658 Japan

ISBN: (纸本)9781538610442

Preconditioned parallel solvers based on the Krylov iterative method are widely used in scientific and engineering applications. Communication overhead is a critical issue when executing these solvers on large-scale massively parallel supercomputers. In this work, we introduced communication-computation (CC) overlapping with dynamic loop scheduling of OpenMP to the sparse matrix-vector multiplication (SpMV) process of a parallel iterative solver. We then used the solver to evaluate the performance of a parallel finite element application (GeoFEM/Cube) on multicore and manycore clusters. The dynamic loop scheduling of OpenMP improved the efficiency of CC overlapping in halo exchanges, and the developed method attained a significant performance improvement of 40-50% for parallel iterative solvers in strong scaling using up to 16,384 cores of a Fujitsu PRIMEHPC FX10 supercomputer and an Intel Xeon Phi (KNL) cluster. Finally, the developed method was applied to GeoFEM/Cube using a parallel BiCGSTAB solver with sparse approximate inverse (SAI) preconditioning, and a 15-20% performance improvement was obtained on 12,288 cores of the Fujitsu FX10 and the KNL cluster.

关键词： parallel computing iterative solvers hybrid parallel programming model communication-computation overlapping dynamic loop scheduling

来源：评论

学校读者我要写书评

暂无评论

Performance Portability of Molecular Docking Miniapp On Leadership Computing Platforms

Performance Portability of Molecular Docking Miniapp On Lead...

引用

IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)

作者： Thavappiragasam, Mathialakan Scheinberg, Aaron Elwasif, Wael Hernandez, Oscar Sedova, Ada Oak Ridge Natl Lab Oak Ridge TN 37830 USA Jubilee Dev Cambridge MA USA

ISBN: (纸本)9781665422871

Rapidly changing computer architectures, such as those found at high-performance computing (HPC) facilities, present the need for mini-applications (miniapps) that capture essential algorithms used in large applications to test program performance and portability, aiding transitions to new systems. The COVID-19 pandemic has fueled a flurry of activity in computational drug discovery, including the use of supercomputers and GPU acceleration for massive virtual screens for therapeutics. Recent work targeting COVID-19 at the Oak Ridge Leadership Computing Facility (OLCF) used the GPU-accelerated program AutoDock-GPU to screen billions of compounds on the Summit supercomputer. In this paper we present the development of a new miniapp, miniAutoDock-GPU, that can be used to evaluate the performance and portability of GPU-accelerated prote-inligand docking programs on different computer architectures. These tests are especially relevant as facilities transition from petascale systems and prepare for upcoming exascale systems that will use a variety of GPU vendors. The key calculations, namely, the Lamarckian genetic algorithm combined with a local search using a Solis-Wets based random optimization algorithm, are implemented. We developed versions of the miniapp using several different programming models for GPU acceleration, including a version using the CUDA runtime API for NVIDIA GPUs, and the Kokkos middle-ware API which is facilitated by C++ template libraries. A third version, currently in progress, uses the HIP programming model. These efforts will help facilitate the transition to exascale systems for this important emerging HPC application, as well as its use on a wide range of heterogeneous platforms.

关键词： heterogeneous system high-performance computing performance portability hybrid parallel programming model molecular docking drug discovery

来源：评论

学校读者我要写书评

暂无评论

An efficient hybrid MPI/OpenMP parallelization of the asynchronous ADMM algorithm 19

An efficient hybrid MPI/OpenMP parallelization of the asynch...

引用

19th IEEE International Symposium on parallel and Distributed Processing with Applications (IEEE ISPA)

作者： Qiu, Qinnan Lei, Yongmei Wang, Dongxia Wang, Guozheng Shanghai Univ Sch Comp Engn & Sci Shanghai Peoples R China

ISBN: (纸本)9781665435741

Alternating direction method of multipliers (ADMM) is an efficient algorithm to solve large- scale machine learning problems in a distributed environment. To make full use of the hierarchical memory model in modern highperformance computing systems, this paper implements a hybrid MPI/OpenMP parallelization of the asynchronous ADMM algorithm (AH-ADMM). The AH-ADMM algorithm updates local variables in parallel by OpenMP threads and exchanges information between MPI processes, which relieves memory and communication pressure by replacing multiprocessing with multi- threading. Furthermore, for the SVM problem, the AH-ADMMalgorithm speeds up the calculation of sub- problems through an efficient parallel optimization strategy. This paper effectively combines the features of both algorithm design and programming model. Experiments on the Ziqiang4000 high-performance cluster demonstrate that the AH- ADMM algorithm scales better and run faster than the existing distributed ADMM algorithms implemented by pure MPI. The AH-ADMM can reduce the communication overhead by up to 91.8% and increase the convergence rate by up to 36x. For large datasets, the AH-ADMM can scale well on the cluster which over 129 cores.

关键词： distributed ADMM algorithm asynchronous communication hybrid parallel programming model MPI OpenMP

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：