检索结果-内蒙古大学图书馆

Data parallel Algorithmic Skeletons with Accelerator Support

INTERNATIONAL JOURNAL OF parallel programming 2017年第2期45卷 283-299页

作者： Ernsting, Steffen Kuchen, Herbert Univ Munster Leonardo Campus 3 D-48149 Munster Germany

Hardware accelerators such as GPUs or Intel Xeon Phi comprise hundreds or thousands of cores on a single chip and promise to deliver high performance. They are widely used to boost the performance of highly parallel applications. However, because of their diverging architectures programmers are facing diverging programming paradigms. Programmers also have to deal with low-level concepts of parallel programming that make it a cumbersome task. In order to assist programmers in developing parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel programming patterns, thereby shielding programmers from low-level aspects of parallel programming. The main contribution of this paper is a comparison of two skeleton library implementations, one in C++ and one in Java, in terms of library design and programmability. Besides, on the basis of four benchmark applications we evaluate the performance of the presented implementations on two test systems, a GPU cluster and a Xeon Phi system. The two implementations achieve comparable performance with a slight advantage for the C++ implementation. Xeon Phi performance ranges between CPU and GPU performance.

关键词： high-level parallel programming Algorithmic skeletons GPGPU Hardware accelerators

来源：评论

学校读者我要写书评

暂无评论

Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments

引用

INTERNATIONAL JOURNAL OF parallel programming 2024年第4期52卷 274-297页

作者： Herrmann, Nina Dieckmann, Justus Kuchen, Herbert Univ Munster Pract Comp Sci Leonardo Campus 3 D-48149 Munster Germany

Complex algorithms and enormous data sets require parallel execution of programs to attain results in a reasonable amount of time. Both aspects are combined in the domain of three-dimensional stencil operations, for example, computational fluid dynamics. This work contributes to the research on high-level parallel programming by discussing the generalizable implementation of a three-dimensional stencil skeleton that works in heterogeneous computing environments. Two exemplary programs, a gas simulation with the Lattice Boltzmann method, and a mean blur, are executed in a multi-node multi-graphics processing units environment, proving the runtime improvements in heterogeneous computing environments compared to a sequential program.

关键词： Skeleton programming Three-dimensional stencil operations high-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

Assessing Application Efficiency and Performance Portability in Single-Source programming for Heterogeneous parallel Systems

引用

INTERNATIONAL JOURNAL OF parallel programming 2023年第1期51卷 61-82页

作者： Ernstsson, August Griebler, Dalvan Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden Pontif Catholic Univ Rio Grande do Sul PUCRS Sch Technol Porto Alegre Brazil

We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU-GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.

关键词： Algorithmic skeletons parallel efficiency Performance portability Heterogeneous parallel computing high-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

Accelerating OCaml Programs on FPGA

引用

INTERNATIONAL JOURNAL OF parallel programming 2023年第2-3期51卷 186-207页

作者： Sylvestre, Loic Chailloux, Emmanuel Serot, Jocelyn Sorbonne Univ CNRS LIP6 F-75005 Paris France Univ Clermont Auvergne Inst Pascal Clermont Auvergne INP CNRS F-63000 Clermont Ferrand France

This paper aims to exploit the massive parallelism of Field-Programmable Gate Arrays (FPGAs) by programming them in OCaml, a multiparadigm and statically typed language. It first presents O2B, an implementation of the OCaml virtual machine using a softcore processor to run the entire OCaml language on an FPGA. It then introduces Macle, a language to express, in ML-style, hardware-accelerated user-defined functions, implemented as gates and registers on the same FPGA. Macle allows to implement pure computations and compose them in parallel. It also supports processing of dynamic data structures such as arrays, matrices and trees allocated by the OCaml runtime in the memory of the softcore processor. Macle functions can then be called, as hardware accelerators, by OCaml programs executed by O2B. This combination of Macle and OCaml codes in a single source program enables to easily prototype FPGA applications mixing numeric and symbolic computations.

关键词： high-level parallel programming FPGA OCaml Virtual machine Hardware acceleration Compiling

来源：评论

学校读者我要写书评

暂无评论

AUTOMATIC MAPPING OF ASSIST APPLICATIONS USING PROCESS ALGEBRA

引用

parallel PROCESSING LETTERS 2008年第1期18卷 175-188页

作者： Aldinucci, Marco Benoit, Anne Univ Pisa Dept Comp Sci Largo B Pontecorvo 3 I-56127 Pisa Italy Ecole Normale Super Lyon ENS LIP F-69364 Lyon 07 France

Grid technologies aim to harness the computational capabilities of widely distributed collections of computers. Due to the heterogeneous and dynamic nature of the set of grid resources, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.

关键词： high-level parallel programming ASSIST environment Performance Evaluation Process Algebra (PEPA) automatic model generation

来源：评论

学校读者我要写书评

暂无评论

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators

MetaFork: A Compilation Framework for Concurrency Models Tar...

引用

作者： Xiaohui Chen University of Western Ontario

学位级别：博士

parallel programming is gaining ground in various domains due to the tremendous com- putational power that it brings; however, it also requires a substantial code crafting effort to achieve performance improvement. Unfortunately, in most cases, performance tuning has to be accomplished manually by programmers. We argue that automated tuning is necessary due to the combination of the following factors. First, code optimization is machine-dependent. That is, optimization preferred on one machine may be not suitable for another machine. Second, as the possible optimization search space increases, manually finding an optimized configura- tion is hard. Therefore, developing new compiler techniques for optimizing applications is of considerable interest. This thesis aims at generating new techniques that will help programmers develop efficient algorithms and code targeting hardware acceleration technologies, in a more effective manner. Our work is organized around a compilation framework, called MetaFork, for concurrency platforms and its application to automatic parallelization. MetaFork is a high-level program- ming language extending C/C++, which combines several models of concurrency including fork-join, SIMD and pipelining parallelism. MetaFork is also a compilation framework which aims at facilitating the design and implementation of concurrent programs through four key features which make MetaFork unique and novel: (1) Perform automatic code translation between concurrency platforms targeting multi-core architectures. (2) Provide a high-level language for expressing concurrency as in the fork-join model, the SIMD paradigm and the pipelining parallelism. (3) Generate parallel code from serial code with an emphasis on code depending on machine or program parameters (e. g. cache size, number of processors, number of threads per thread block). (4) Optimize code depending on parameters that are unknown at compile-time.

关键词： source-to-source compiler pipelining comprehensive parametric CUDA kernel generation concurrency platforms high-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

SPar: A DSL for high-level and Productive Stream parallelism

引用

parallel PROCESSING LETTERS 2017年第1期27卷

作者： Griebler, Dalvan Danelutto, Marco Torquati, Massimo Fernandes, Luiz Gustavo Pontifical Catholic Univ Rio Grande do Sul PUCRS Fac Informat Comp Sci Grad Program PPGCC GMAP Ave Ipiranga6681 Bldg 32 BR-90619900 Porto Alegre RS Brazil Univ Pisa UNIPI Dept Comp Sci Parallel Programming Models Grp Largo Pontecorvo 3 I-56127 Pisa Italy

This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness.

关键词： Stream parallelism high-level parallel programming domain-specific languages parallel design patterns algorithmic skeletons C++11 attributes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：