检索结果-内蒙古大学图书馆

Mashing load balancing algorithm to boost hybrid kernels in molecular dynamics simulations

JOURNAL OF SUPERCOMPUTING 2023年第1期79卷 1065-1080页

作者： Nozal, Raul Luis Bosque, Jose Univ Cantabria Santander Spain

The path to the efficient exploitation of molecular dynamics simulators is strongly driven by the increasingly intensive use of accelerators. However, they suffer performance portability issues, making it necessary both to achieve technological combinations that allow taking advantage of each programming model and device, and to define more effective load distribution strategies that consider the simulation conditions. In this work, a new load balancing algorithm is presented, together with a set of optimizations to support hybrid co-execution in a runtime system for heterogeneous computing. The new extended design enables the exploitation of custom kernels and acceleration technologies altogether, being encapsulated for the rest of the runtime and its scheduling system. With this support, Mash algorithm allows to simultaneously leverage different workload distribution strategies, benefiting from the most advantageous one per device and technology. Experiments show that these proposals achieve an efficiency close to 0.90 and an energy efficiency improvement around 1.80 over the original optimized version.

关键词： Load balancing Co-execution hybrid programming models HPC Molecular dynamics OpenMP OpenCL C plus plus CPU-GPU-MIC Accelerators

来源：评论

学校读者我要写书评

暂无评论

Performance models for hybrid Programs Accelerated by GPUs

Performance Models for Hybrid Programs Accelerated by GPUs

引用

35th IEEE International Parallel and Distributed Processing Symposium (IPDPS)

作者： Sasidharan, Aparna Ansys Inc Lebanon NH 03766 USA

ISBN: (纸本)9781665435772

This paper describes the use of statistical tools to model the performance of mixed device (hosts and devices) programs where hosts are CPUs and devices are GPUs. The purpose of GPUs is to accelerate compute-intensive sections of a program, thereby reducing total execution time, with side-effects including reduced machine usage and energy consumption. To model major and minor factors that affect the execution time of offloaded programs, we used a compute-intensive program with several GPU kernels. We have abstracted the hybrid program as a sequence of computations that access various types of memories (device caches, device shared memory, memory of other devices and host memory). In the programming model discussed, the role of a host is reduced to scheduling and coordinating execution of kernels across devices and communicating with other hosts. It can be extended to models where hosts perform computations or are obliterated. Experiments were designed to include a range of memory sizes and types. The performance models were trained, and their predictions were verified using test data.

关键词： hybrid programming models Multi-GPU architectures Unified Memory Performance models Statistics

来源：评论

学校读者我要写书评

暂无评论

Performance Meets Programmabilty: Enabling Native Python MPI Tasks In PyCOMPSs 28

Performance Meets Programmabilty: Enabling Native Python MPI...

引用

28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

作者： Elshazly, Hatem Lordan, Fratacesc Ejarque, Jorge Badia, Rosa M. Barcelona Supercomp Ctr BSC Dept Comp Sci Barcelona Spain

ISBN: (纸本)9781728165820

The increasing complexity of modern and future computing systems makes it challenging to develop applications that aim for maximum performance. hybrid parallel programming models offer new ways to exploit the capabilities of the underlying infrastructure. However, the performance gain is sometimes accompanied by increased programming complexity. We introduce an extension to PyCOMPSs, a high-level task-based parallel programming model for Python applications, to support tasks that use MPI natively as part of the task model. Without compromising application's programmability, using Native MPI tasks in PyCOMPSs offers up to 3x improvement in total performance for compute intensive applications and up to 1.9x improvement in total performance for 110 intensive applications over sequential implementation of the tasks.

关键词： hybrid programming models Distributed Computing MPI High Performance Computing Task-based Parallel programming models Performance Productivity

来源：评论

学校读者我要写书评

暂无评论

Performance of hybrid programming models for Multiscale Cardiac Simulations: Preparing for Petascale Computation

引用

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 2011年第10期58卷 2965-2969页

作者： Pope, Bernard J. Fitch, Blake G. Pitman, Michael C. Rice, John J. Reumann, Matthias IBM Res Collaboratory Life Sci Melbourne Melbourne Vic 3010 Australia Univ Melbourne Dept Comp Sci & Software Engn Carlton Vic 3010 Australia Victorian Life Sci Computat Initiat Carlton Vic 3010 Australia IBM TJ Watson Res Ctr Yorktown Hts NY 10598 USA

Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e. g., the message-passing interface (MPI)] with multithreading (e. g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.

关键词： High-performance computing (HPC) hybrid programming models multiphysics cardiac model multiscale

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimization of a hybrid Seismic Imaging Application

Performance Analysis and Optimization of a Hybrid Seismic Im...

引用

16th Annual International Conference on Computational Science (ICCS)

作者： Paul, Sri Raj Araya-Polo, Mauricio Mellor-Crummey, John Hohl, Detlef Rice Univ Houston TX 77005 USA Shell Int Explorat & Prod Inc Houston TX USA

Applications to process seismic data are computationally expensive and, therefore, employ scalable parallel systems to produce timely results. Here we describe our experiences of using performance analysis tools to gain insight into an MPI+OpenMP code developed by Shell that performs Reverse Time Migration on a cluster to produce models of the subsurface. Tuning MPI+OpenMP programs for modern platforms is difficult, and, therefore, assistance is required from performance analysis tools. These tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from Rice University's HPCToolkit and hardware performance counters, we were able to improve the performance of Shell's prototype distributed-memory Reverse Time Migration code by roughly 30 percent.

关键词： reverse time migration performance analysis MPI plus OpenMP hybrid programming models

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimization of a hybrid Seismic Imaging Application

引用

Procedia Computer Science 2016年 80卷 8-18页

作者： Sri Raj Paul Mauricio Araya-Polo John Mellor-Crummey Detlef Hohl Rice University Houston TX USA Shell International Exploration & Production Inc. Houston TX USA

关键词： reverse time migration performance analysis MPI+OpenMP hybrid programming models

来源：评论

学校读者我要写书评

暂无评论

Petascale Computing with Accelerators

Petascale Computing with Accelerators

引用

14th ACM SIGPLAN Symposium on Principles and Practice of Parallel programming

作者： Kistler, Michael Gunnels, John Brokenshire, Daniel Benton, Brad IBM Corp Austin TX 78758 USA IBM Corp Yorktown Hts NY 10598 USA

ISBN: (纸本)9781605583976

A trend is developing in high performance computing in which commodity processors are coupled to various types of computational accelerators. Such systems are commonly called hybrid systems. In this paper, we describe our experience developing an implementation of the Linpack benchmark for a petascale hybrid system, the LANL Roadrunner cluster built by IBM for Los Alamos National Laboratory. This system combines traditional x86-64 host processors with IBM PowerXCell (TM) 8i accelerator processors. The implementation of Linpack we developed was the first to achieve a performance result in excess of 1.0 PFLOPS, and made Roadrunner the #1 system on the Top500 list in June 2008. We describe the design and implementation of hybrid Linpack, including the special optimizations we developed for this hybrid architecture. We then present actual results for single node and multi-node executions. From this work, we conclude that it is possible to achieve high performance for certain applications on hybrid architectures when careful attention is given to efficient use of memory bandwidth, scheduling of data movement between the host and accelerator memories, and proper distribution of work between the host and accelerator processors.

关键词： Algorithms Performance Design Accelerators hybrid programming models

来源：评论

学校读者我要写书评

暂无评论

Petascale Computing with Accelerators

引用

ACM SIGPLAN NOTICES 2009年第4期44卷 241-249页

作者： Kistler, Michael Gunnels, John Brokenshire, Daniel Benton, Brad IBM Corp Austin TX 78758 USA IBM Corp Yorktown Hts NY 10598 USA

关键词： Algorithms Performance Design Accelerators hybrid programming models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：