检索结果-内蒙古大学图书馆

A Generate-Test-Aggregate parallel programming library for systematic parallel programming

parallel COMPUTING 2014年第2期40卷 116-135页

作者： Liu, Yu Emoto, Kento Hu, Zhenjiang Grad Univ Adv Studies Tokyo Japan Kyushu Inst Technol Iizuka Fukuoka Japan Natl Inst Informat Tokyo Japan

The Generate-Test-Aggregate (GTA for short) algorithm is modeled following a simple and straightforward programming pattern, for combinatorial problems. First, generate all candidates;second, test and filter out invalid ones;finally, aggregate valid ones to make the final result. These three processing steps can be specified by three building blocks namely, generator, tester, and aggregator. Despite the simplicity of algorithm design, implementing the GTA algorithm naively following the three processing steps, i.e., brute-force, will result in an exponential-cost computation, and thus it is impractical for processing large data. The theory of GTA illustrates that if the definitions of generator, tester, and aggregator satisfy certain conditions, an efficient (usually near-linear cost) MapReduce program can be automatically derived from the GTA algorithm. The principle of GTA is attractive but how to make it being practically useful, remains as an important and challenge problem due to the complexity of GTA program transformations. In this paper, we report on our studying and implementation of a practical GTA library (written in the functional language Scala) which provides a systematic parallel programming approach for big-data analysis with MapReduce. The library provides a simple functional style programming interface and hides all the internal transformations. With this library, users can write parallel programs in a sequential manner in terms of the GTA algorithm, and the efficiency of the generated MapReduce programs is guaranteed systematically. Therefore, parallel programming for many problems could become no more a tough job. We demonstrate the usefulness of our GTA library on some interesting problems involving large data and show that lots of applications can be easily and efficiently solved by using our library. (C) 2013 Elsevier By. All rights reserved.

关键词： high-level parallel programming Generate-Test-Aggregate algorithm Program transformation Program calculation MapReduce Functional programming

来源：评论

学校读者我要写书评

暂无评论

Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments

引用

INTERNATIONAL JOURNAL OF parallel programming 2024年第4期52卷 274-297页

作者： Herrmann, Nina Dieckmann, Justus Kuchen, Herbert Univ Munster Pract Comp Sci Leonardo Campus 3 D-48149 Munster Germany

Complex algorithms and enormous data sets require parallel execution of programs to attain results in a reasonable amount of time. Both aspects are combined in the domain of three-dimensional stencil operations, for example, computational fluid dynamics. This work contributes to the research on high-level parallel programming by discussing the generalizable implementation of a three-dimensional stencil skeleton that works in heterogeneous computing environments. Two exemplary programs, a gas simulation with the Lattice Boltzmann method, and a mean blur, are executed in a multi-node multi-graphics processing units environment, proving the runtime improvements in heterogeneous computing environments compared to a sequential program.

关键词： Skeleton programming Three-dimensional stencil operations high-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

Accelerating OCaml Programs on FPGA

引用

INTERNATIONAL JOURNAL OF parallel programming 2023年第2-3期51卷 186-207页

作者： Sylvestre, Loic Chailloux, Emmanuel Serot, Jocelyn Sorbonne Univ CNRS LIP6 F-75005 Paris France Univ Clermont Auvergne Inst Pascal Clermont Auvergne INP CNRS F-63000 Clermont Ferrand France

This paper aims to exploit the massive parallelism of Field-Programmable Gate Arrays (FPGAs) by programming them in OCaml, a multiparadigm and statically typed language. It first presents O2B, an implementation of the OCaml virtual machine using a softcore processor to run the entire OCaml language on an FPGA. It then introduces Macle, a language to express, in ML-style, hardware-accelerated user-defined functions, implemented as gates and registers on the same FPGA. Macle allows to implement pure computations and compose them in parallel. It also supports processing of dynamic data structures such as arrays, matrices and trees allocated by the OCaml runtime in the memory of the softcore processor. Macle functions can then be called, as hardware accelerators, by OCaml programs executed by O2B. This combination of Macle and OCaml codes in a single source program enables to easily prototype FPGA applications mixing numeric and symbolic computations.

关键词： high-level parallel programming FPGA OCaml Virtual machine Hardware acceleration Compiling

来源：评论

学校读者我要写书评

暂无评论

Assessing Application Efficiency and Performance Portability in Single-Source programming for Heterogeneous parallel Systems

引用

INTERNATIONAL JOURNAL OF parallel programming 2023年第1期51卷 61-82页

作者： Ernstsson, August Griebler, Dalvan Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden Pontif Catholic Univ Rio Grande do Sul PUCRS Sch Technol Porto Alegre Brazil

We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU-GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.

关键词： Algorithmic skeletons parallel efficiency Performance portability Heterogeneous parallel computing high-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

AUTOMATIC MAPPING OF ASSIST APPLICATIONS USING PROCESS ALGEBRA

引用

parallel PROCESSING LETTERS 2008年第1期18卷 175-188页

作者： Aldinucci, Marco Benoit, Anne Univ Pisa Dept Comp Sci Largo B Pontecorvo 3 I-56127 Pisa Italy Ecole Normale Super Lyon ENS LIP F-69364 Lyon 07 France

Grid technologies aim to harness the computational capabilities of widely distributed collections of computers. Due to the heterogeneous and dynamic nature of the set of grid resources, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.

关键词： high-level parallel programming ASSIST environment Performance Evaluation Process Algebra (PEPA) automatic model generation

来源：评论

学校读者我要写书评

暂无评论

MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators

MetaFork: A Compilation Framework for Concurrency Models Tar...

引用

作者： Xiaohui Chen University of Western Ontario

学位级别：博士

parallel programming is gaining ground in various domains due to the tremendous com- putational power that it brings; however, it also requires a substantial code crafting effort to achieve performance improvement. Unfortunately, in most cases, performance tuning has to be accomplished manually by programmers. We argue that automated tuning is necessary due to the combination of the following factors. First, code optimization is machine-dependent. That is, optimization preferred on one machine may be not suitable for another machine. Second, as the possible optimization search space increases, manually finding an optimized configura- tion is hard. Therefore, developing new compiler techniques for optimizing applications is of considerable interest. This thesis aims at generating new techniques that will help programmers develop efficient algorithms and code targeting hardware acceleration technologies, in a more effective manner. Our work is organized around a compilation framework, called MetaFork, for concurrency platforms and its application to automatic parallelization. MetaFork is a high-level program- ming language extending C/C++, which combines several models of concurrency including fork-join, SIMD and pipelining parallelism. MetaFork is also a compilation framework which aims at facilitating the design and implementation of concurrent programs through four key features which make MetaFork unique and novel: (1) Perform automatic code translation between concurrency platforms targeting multi-core architectures. (2) Provide a high-level language for expressing concurrency as in the fork-join model, the SIMD paradigm and the pipelining parallelism. (3) Generate parallel code from serial code with an emphasis on code depending on machine or program parameters (e. g. cache size, number of processors, number of threads per thread block). (4) Optimize code depending on parameters that are unknown at compile-time.

关键词： source-to-source compiler pipelining comprehensive parametric CUDA kernel generation concurrency platforms high-level parallel programming

来源：评论

学校读者我要写书评

暂无评论

SPar: A DSL for high-level and Productive Stream parallelism

引用

parallel PROCESSING LETTERS 2017年第1期27卷 1740005-1740005页

作者： Griebler, Dalvan Danelutto, Marco Torquati, Massimo Fernandes, Luiz Gustavo Pontifical Catholic Univ Rio Grande do Sul PUCRS Fac Informat Comp Sci Grad Program PPGCC GMAP Ave Ipiranga6681 Bldg 32 BR-90619900 Porto Alegre RS Brazil Univ Pisa UNIPI Dept Comp Sci Parallel Programming Models Grp Largo Pontecorvo 3 I-56127 Pisa Italy

This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness.

关键词： Stream parallelism high-level parallel programming domain-specific languages parallel design patterns algorithmic skeletons C++11 attributes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：