检索结果-内蒙古大学图书馆

Hybrid CPU-GPU execution support in the skeleton programming framework SkePU

JOURNAL OF SUPERCOMPUTING 2020年第7期76卷 5038-5056页

作者： Ohberg, Tomas Ernstsson, August Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden

In this paper, we present a hybrid execution backend for the skeleton programming framework SkePU. The backend is capable of automatically dividing the workload and simultaneously executing the computation on a multi-core CPU and any number of accelerators, such as GPUs. We show how to efficiently partition the workload of skeletons such as Map, MapReduce, and Scan to allow hybrid execution on heterogeneous computer systems. We also show a unified way of predicting how the workload should be partitioned based on performance modeling. With experiments on typical skeleton instances, we show the speedup for all skeletons when using the new hybrid backend. We also evaluate the performance on some real-world applications. Finally, we show that the new implementation gives higher and more reliable performance compared to an old hybrid execution implementation based on dynamic scheduling.

关键词： Heterogeneous computing Hybrid execution skeleton programming Workload partitioning

来源：评论

学校读者我要写书评

暂无评论

Extending smart containers for data locality-aware skeleton programming

Extending smart containers for data locality-aware skeleton ...

引用

10th International Symposium on High-Level Parallel programming and Applications (HLPP)

作者： Ernstsson, August Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci Linkoping Sweden

We present an extension for the SkePU skeleton programming framework to improve the performance of sequences of transformations on smart containers. By using lazy evaluation, SkePU records skeleton invocations and dependencies as directed by smart container operands. When a partial result is required by a different part of the program, the run-time system will process the entire lineage of skeleton invocations;tiling is applied to keep chunks of container data in the working set for the whole sequence of transformations. The approach is inspired by big data frameworks operating on large clusters where good data locality is crucial. We also consider benefits other than data locality with the increased run-time information given by the lineage structures, such as backend selection for heterogeneous systems. Experimental evaluation of example applications shows potential for performance improvements due to better cache utilization, as long as the overhead of lineage construction and management is kept low.

关键词： lazy evaluation loop tiling skeleton programming SkePU smart containers

来源：评论

学校读者我要写书评

暂无评论

SkePU 2: Flexible and Type-Safe skeleton programming for Heterogeneous Parallel Systems

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2018年第1期46卷 62-80页

作者： Ernstsson, August Li, Lu Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden

In this article we present SkePU 2, the next generation of the SkePU C++ skeleton programming framework for heterogeneous parallel systems. We critically examine the design and limitations of the SkePU 1 programming interface. We present a new, flexible and type-safe, interface for skeleton programming in SkePU 2, and a source-to-source transformation tool which knows about SkePU 2 constructs such as skeletons and user functions. We demonstrate how the source-to-source compiler transforms programs to enable efficient execution on parallel heterogeneous systems. We show how SkePU 2 enables new use-cases and applications by increasing the flexibility from SkePU 1, and how programming errors can be caught earlier and easier thanks to improved type safety. We propose a new skeleton, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend computations. We also discuss how the source-to-source compiler can enable a new optimization opportunity by selecting among multiple user function specializations when building a parallel program. Finally, we show that the performance of our prototype SkePU 2 implementation closely matches that of SkePU 1.

关键词： skeleton programming SkePU Source-to-source transformation C++11 Heterogeneous parallel systems Portability

来源：评论

学校读者我要写书评

暂无评论

Extending smart containers for data locality-aware skeleton programming

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2019年第5期31卷

作者： Ernstsson, August Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci Linkoping Sweden

关键词： lazy evaluation loop tiling skeleton programming SkePU smart containers

来源：评论

学校读者我要写书评

暂无评论

Smart Containers and skeleton programming for GPU-Based Systems

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2016年第3期44卷 506-530页

作者： Dastgeer, Usman Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB S-58183 Linkoping Sweden

In this paper, we discuss the role, design and implementation of smart containers in the SkePU skeleton library for GPU-based systems. These containers provide an interface similar to C++ STL containers but internally perform runtime optimization of data transfers and runtime memory management for their operand data on the different memory units. We discuss how these containers can help in achieving asynchronous execution for skeleton calls while providing implicit synchronization capabilities in a data consistent manner. Furthermore, we discuss the limitations of the original, already optimizing memory management mechanism implemented in SkePU containers, and propose and implement a new mechanism that provides stronger data consistency and improves performance by reducing communication and memory allocations. With several applications, we show that our new mechanism can achieve significantly (up to 33.4 times) better performance than the initial mechanism for page-locked memory on a multi-GPU based system.

关键词： SkePU Smart containers skeleton programming Memory management Runtime optimizations GPU-based systems

来源：评论

学校读者我要写书评

暂无评论

SkePU 2: Language Embedding and Compiler Support for Flexible and Type-Safe skeleton programming

SkePU 2: Language Embedding and Compiler Support for Flexibl...

引用

作者： Ernstsson, August Linköping University

学位级别：master

This thesis presents SkePU 2, the next generation of the SkePU C++ framework for programming of heterogeneous parallel systems using the skeleton programming concept. SkePU 2 is presented after a thorough study of the state of parallel programming models, frameworks and tools, including other skeleton programming systems. The advancements in SkePU 2 include a modern C++11 foundation, a native syntax for skeleton parameterization with user functions, and an entirely new source-to-source translator based on Clang compiler front-end *** 2 extends the functionality of SkePU 1 by embracing metaprogramming techniques and C++11 features, such as variadic templates and lambda expressions. The results are improved programmability and performance in many situations, as shown in both a usability survey and performance evaluations on high-performance computing hardware. SkePU’s skeleton programming model is also extended with a new construct, Call, unique in the sense that it does not impose any predefined skeleton structure and can encapsulate arbitrary user-defined multi-backend *** conclude that SkePU 2 is a promising new direction for the SkePU project, and a solid basis for future work, for example in performance optimization

关键词： skeleton programming SkePU Source-to-source transformation C++11 Heterogeneous parallel systems Portability Natural Sciences Computer and Information Science Computer Science Naturvetenskap Data- och informationsvetenskap Datavetenskap (datalogi) Computer Engineering Datateknik

来源：评论

学校读者我要写书评

暂无评论

Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2024年第4期52卷 274-297页

作者： Herrmann, Nina Dieckmann, Justus Kuchen, Herbert Univ Munster Pract Comp Sci Leonardo Campus 3 D-48149 Munster Germany

Complex algorithms and enormous data sets require parallel execution of programs to attain results in a reasonable amount of time. Both aspects are combined in the domain of three-dimensional stencil operations, for example, computational fluid dynamics. This work contributes to the research on high-level parallel programming by discussing the generalizable implementation of a three-dimensional stencil skeleton that works in heterogeneous computing environments. Two exemplary programs, a gas simulation with the Lattice Boltzmann method, and a mean blur, are executed in a multi-node multi-graphics processing units environment, proving the runtime improvements in heterogeneous computing environments compared to a sequential program.

关键词： skeleton programming Three-dimensional stencil operations High-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

Distributed Calculations with Algorithmic skeletons for Heterogeneous Computing Environments

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2023年第2-3期51卷 172-185页

作者： Herrmann, Nina Kuchen, Herbert Univ Munster Leonardo Campus 3 D-48149 Munster Germany

Contemporary HPC hardware typically provides several levels of parallelism, e.g. multiple nodes, each having multiple cores (possibly with vectorization) and accelerators. Efficiently programming such systems usually requires skills in combining several low-level frameworks such as MPI, OpenMP, and CUDA. This overburdens programmers without substantial parallel programming skills. One way to overcome this problem and to abstract from details of parallel programming is to use algorithmic skeletons. In the present paper, we evaluate the multi-node, multi-CPU and multi-GPU implementation of the most essential skeletons Map, Reduce, and Zip. Our main contribution is a discussion of the efficiency of using multiple parallelization levels and the consideration of which fine-tune settings should be offered to the user.

关键词： Parallel programming skeleton programming Heterogeneous computing environments High-level frameworks Usability

来源：评论

学校读者我要写书评

暂无评论

Stencil Calculations with Algorithmic skeletons for Heterogeneous Computing Environments

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2022年第5-6期50卷 433-453页

作者： Herrmann, Nina de Melo Menezes, Breno A. Kuchen, Herbert Univ Munster Leonardo Campus 3 D-48149 Munster Germany

The development of parallel applications is a difficult and error-prone task, especially for inexperienced programmers. Stencil operations are exceptionally complex for parallelization as synchronization and communication between the individual processes and threads are necessary. It gets even more difficult to efficiently distribute the computations and efficiently implement communication when heterogeneous computing environments are used. For using multiple nodes, each having multiple cores and accelerators such as GPUs, skills in combining frameworks such as MPI, OpenMP, and CUDA are required. The complexity of parallelizing the stencil operation increases the need for abstracting from the platform-specific details and simplify parallel programming. One way to abstract from details of parallel programming is to use algorithmic skeletons. This work introduces an implementation of the MapStencil skeleton that is able to generate parallel code for distributed memory environments, using multiple nodes with multicore CPUs and GPUs. Examples of practical applications of the MapStencil skeleton are the Jacobi Solver or the Canny Edge Detector. The main contribution of this paper is a discussion of the difficulties when implementing a universal skeleton for MapStencil for heterogeneous computing environments and an outline of the identified best practices for communication intense skeletons.

关键词： Parallel programming skeleton programming Heterogeneous computing environments High-level frameworks Stencil operations

来源：评论

学校读者我要写书评

暂无评论

A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based programming of Heterogeneous Parallel Systems

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2022年第3-4期50卷 319-340页

作者： Ernstsson, August Vandenbergen, Nicolas Keller, Joerg Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden Julich Supercomp Ctr Inst Adv Simulat FZ Julich Germany Fernuniv Fac Math & Comp Sci Hagen Germany

SkePU is a pattern-based high-level programming model for transparent program execution on heterogeneous parallel computing systems. A key feature of SkePU is that, in general, the selection of the execution platform for a skeleton-based function call need not be determined statically. On single-node systems, SkePU can select among CPU, multithreaded CPU, single or multi-GPU execution. Many scientific applications use pseudo-random number generators (PRNGs) as part of the computation. In the interest of correctness and debugging, deterministic parallel execution is a desirable property, which however requires a deterministically parallelized pseudo-random number generator. We present the API and implementation of a deterministic, portable parallel PRNG extension to SkePU that is scalable by design and exhibits the same behavior regardless where and with how many resources it is executed. We evaluate it with four probabilistic applications and show that the PRNG enables scalability on both multi-core CPU and GPU resources, and hence supports the universal portability of SkePU code even in the presence of PRNG calls, while source code complexity is reduced.

关键词： skeleton programming Parallelizable algorithmic pattern Heterogeneous system GPGPU Deterministic parallel pseudo-random number generator

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：