检索结果-内蒙古大学图书馆

Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models

JOURNAL OF SUPERCOMPUTING 2018年第11期74卷 5628-5642页

作者： Castello, Adrian Pena, Antonio J. Mayo, Rafael Planas, Judit Quintana-Orti, Enrique S. Balaji, Pavan Univ Jaume I Castello Castellon De La Plana 12071 Spain BSC CNS Barcelona 08034 Spain Ecole Polytech Fed Lausanne CH-1202 Geneva Switzerland Argonne Natl Lab Lemont IL 60439 USA

directive-based programming models, such as OpenMP, OpenACC, and OmpSs, enable users to accelerate applications by using coprocessors with little effort. These devices offer significant computing power, but their use can introduce two problems: an increase in the total cost of ownership and their underutilization because not all codes match their architecture. Remote accelerator virtualization frameworks address those problems. In particular, rCUDA provides transparent access to any graphic processor unit installed in a cluster, reducing the number of accelerators and increasing their utilization ratio. Joining these two technologies, directive-based programming models and rCUDA, is thus highly appealing. In this work, we study the integration of OmpSs and OpenACC with rCUDA, describing and analyzing several applications over three different hardware configurations that include two InfiniBand interconnections and three NVIDIA accelerators. Our evaluation reveals favorable performance results, showing low overhead and similar scaling factors when using remote accelerators instead of local devices.

关键词： GPUs directive-based programming models OpenACC OmpSs Remote virtualization rCUDA

来源：评论

学校读者我要写书评

暂无评论

An Enhanced Profiling Framework for the Analysis and Development of Parallel Primitives for GPUs 9

An Enhanced Profiling Framework for the Analysis and Develop...

引用

9th IEEE International Symposium on Embedded Multicore/Manycore Systems-on-Chip (MCSoC)

作者： Bombieri, Nicola Busato, Federico Fummi, Franco Univ Verona Dept Comp Sci I-37100 Verona Italy

ISBN: (纸本)9781479986705

Parallelizing software applications through the use of existing optimized primitives is a common trend that mediates the complexity of manual parallelization and the use of less efficient directive-based programming models. Parallel primitive libraries allow software engineers to map any sequential code to a target many-core architecture by identifying the most computational intensive code sections and mapping them into one ore more existing primitives. On the other hand, the spreading of such a primitive-based programming model and the different GPU architectures have led to a large and increasing number of third-party libraries, which often provide different implementations of the same primitive, each one optimized for a specific architecture. From the developer point of view, this moves the actual problem of parallelizing the software application to selecting, among the several implementations, the most efficient primitives for the target platform. This paper presents a profiling framework for GPU primitives, which allows measuring the implementation quality of a given primitive by considering the target architecture characteristics. The framework collects the information provided by a standard GPU profiler and combines them into optimization criteria. The criteria evaluations are weighed to distinguish the impact of each optimization on the overall quality of the primitive implementation. The paper shows how the tuning of the different weights has been conducted through the analysis of five of the most widespread existing primitive libraries and how the framework has been eventually applied to improve the implementation performance of a standard primitive.

关键词： graphics processing units parallel programming GPU GPU profiler directive-based programming models graphics processing unit many-core architecture parallel primitives primitive-based programming model profiling framework software applications parallelization Graphics processing units Instruction sets Kernel Libraries Optimization Synchronization Graphics Processing Unit instruction sets Libraries Parallel programming GRAPPER PICK UP core construction Primitive Applications software Kernel Frameworks sequential codes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还