检索结果-内蒙古大学图书馆

Efficient high-level parallel programming

THEORETICAL COMPUTER SCIENCE 1998年第1-2期196卷 71-107页

作者： Botorog, GH Kuchen, H Rhein Westfal TH Aachen Lehrstuhl Informat 2 D-52074 Aachen Germany Univ Munster Inst Wirtschaftsinformat D-48159 Munster Germany

Algorithmic skeletons are polymorphic higher-order functions that represent common parallelization patterns and that are implemented in parallel. They can be used as the building blocks of parallel and distributed applications by embedding them into a sequential language. In this paper, we present a new approach to programming with skeletons. We integrate the skeletons into an imperative host language enhanced with higher-order functions and currying, as well as with a polymorphic type system. We thus obtain a high-level programming language, which can be implemented very efficiently. We then present a compile-time technique for the implementation of the functional features which has an important positive impact on the efficiency of the language. After describing a series of skeletons which work with distributed arrays, we give two examples of parallel algorithms implemented in our language, namely matrix multiplication and Gaussian elimination. Run-time measurements for these and other applications show that we approach the efficiency of message-passing C up to a factor between 1 and 1.5. (C) 1998-Elsevier Science B.V. All rights reserved.

关键词： algorithmic skeletons high-level parallel programming efficient implementation of functional features distributed arrays

来源：评论

学校读者我要写书评

暂无评论

Musket: A Domain-Specific Language for high-level parallel programming with Algorithmic Skeletons 19

Musket: A Domain-Specific Language for High-Level Parallel P...

引用

34th ACM/SIGAPP Annual International Symposium on Applied Computing (SAC)

作者： Rieger, Christoph Wrede, Fabian Kuchen, Herbert Univ Munster ERCIS Munster Germany

ISBN: (纸本)9781450359337

parallel programming for an infrastructure of multi-core or many-core clusters is a challenge for developers without experience in this domain. Developers need to use several libraries such as MPI, OpenMP, and CUDA to efficiently use the hardware which may include additional accelerators such as GPUs. Also, performing low-level optimizations is required in order to reach high performance. One approach to overcome these issues is the concept of Algorithmic Skeletons. These are instances of typical patterns for parallel programming, such as map, fold, and zip, which can simply be composed by an application programmer without taking care of low-level programming aspects. We propose a domain-specific language called Musket that includes algorithmic skeletons as domain abstractions which seamlessly integrate with sequential code while aligning with the C++ programming language for fast learnability. For improved usability, the editing component validates the correctness of models and provides solution hints in the integrated development environment. From the naive program specification, automatic transformations are applied in order to optimize the code for parallel execution. Subsequently, low-level C++ programs are generated which are optimized for multi-core parallelism on a cluster infrastructure. We evaluate the language using benchmark models written in our DSL and compare the execution time and speedup achieved through model preprocessing and code generation. Our experimental results show that the performance of Musket programs can be significantly improved through intermediate optimizations. The DSL approach thus simplifies multi-core application development and enables performance optimizations through model transformations.

关键词： Domain-Specific Language Algorithmic Skeletons high-level parallel programming high Performance Computing

来源：评论

学校读者我要写书评

暂无评论

NoT: a high-level no-threading parallel programming method for heterogeneous systems

引用

JOURNAL OF SUPERCOMPUTING 2019年第7期75卷 3810-3841页

作者： Wu, Shusen Dong, Xiaoshe Zhang, Xingjun Zhu, Zhengdong Xi An Jiao Tong Univ Sch Elect & Informat Engn Xian 710049 Shaanxi Peoples R China

Multithreading is the core of mainstream heterogeneous programming methods such as CUDA and OpenCL. However, multithreaded parallel programming requires programmers to handle low-level runtime details, making the programming process complex and error prone. This paper presents no-threading (NoT), a high-level no-threading programming method. It introduces the association structure, a new language construct, to provide a declarative runtime-free expression of different data parallelisms and avoid the use of multithreading. The NoT method designs C-like syntax for the association structure and implements a compiler and runtime system using OpenCL as an intermediate language. We demonstrate the effectiveness of our techniques with multiple benchmarks. The size of the NoT code is comparable to that of the serial code and is far less than that of the benchmark OpenCL code. The compiler generates efficient OpenCL code, yielding a performance competitive with or equivalent to that of the manually optimized benchmark OpenCL code on both a GPU platform and an MIC platform.

关键词： high-level parallel programming Language construct Association structure Heterogeneous system OpenCL

来源：评论

学校读者我要写书评

暂无评论

parallelizing RNA-Seq Analysis with BioSkel: A FastFlow Based Prototype

引用

INTERNATIONAL JOURNAL OF parallel programming 2025年第2期53卷 1-22页

作者： Beauvais, Valentin Tonci, Nicolo Robert, Sophie Limet, Sebastien Univ Orleans LIFO 6 Rue Leonard de Vinci F-45067 Orleans France Univ Pisa Comp Sci Dept Largo Bruno Pontecorvo 3 I-56127 Pisa Italy

Over the past decade, the widespread adoption of RNA-seq methodology for transcript-level monitoring has resulted in a surge of biological data requiring comprehensive analysis. The BioSkel project aims to develop a framework for RNA sequencing analysis on multi/many-core machines. This framework relies on generic and modular high-level parallel patterns, enabling biologists to customize their data processing to their specific needs while abstracting away the complexities of parallelization. In this study, we introduce the initial prototype of BioSkel for RNA sequencing analysis, which comprises three main steps: sequence alignment, feature counting, and differential expression analysis. This prototype leverages FastFlow as a back-end for parallelizing the execution, either in shared- and distributed-memory. We provide experimental validations of our approach, considering different architectures and dataset sizes. As a valuable byproduct, we introduce a distributed HPC version of Bowtie2 tool, the first publicly available to our knowledge.

关键词： RNA-seq analysis high-level parallel programming Distributed programming Framework for HPC

来源：评论

学校读者我要写书评

暂无评论

Distributed-Memory FastFlow Building Blocks

引用

INTERNATIONAL JOURNAL OF parallel programming 2023年第1期51卷 1-21页

作者： Tonci, Nicolo Torquati, Massimo Mencagli, Gabriele Danelutto, Marco Univ Pisa Comp Sci Dept Pisa Italy

We present the new distributed-memory run-time system (RTS) of the C++-based open-source structured parallel programming library FastFlow. The new RTS enables the execution of FastFlow shared-memory applications written using its Building Blocks (BBs) on distributed systems with minimal changes to the original program. The changes required are all high-level and deal with introducing distributed groups (dgroup), i.e., logical partitions of the BBs composing the application streaming graph. A dgroup, which in turn is implemented using FastFlow's BBs, can be deployed and executed on a remote machine and communicate with other dgroups according to the original shared-memory FastFlow streaming programming model. We present how to define the distributed groups and how we faced the problem of data serialization and communication performance tuning through transparent messages' batching and their scheduling. Finally, we present a study of the overhead introduced by dgroups considering some benchmarks on a sixteen-node cluster.

关键词： high-level parallel programming Distributed programming parallel patterns Algorithmic skeletons Building blocks

来源：评论

学校读者我要写书评

暂无评论

引用

INTERNATIONAL JOURNAL OF parallel programming 2024年第3期52卷 207-230页

作者： Tonci, Nicolo Rivault, Sebastien Bamha, Mostafa Robert, Sophie Limet, Sebastien Torquati, Massimo Univ Pisa Comp Sci Dept Pisa Italy Univ Orleans Orleans France

Similarity joins are recognized to be among the most used data processing and analysis operations. We introduce a C++-based high-level parallel pattern implemented on top of FastFlow Building Blocks to provide the programmer with ready-to-use similarity join computations. The SimilarityJoin pattern is implemented according to the MapReduce paradigm enriched with locality sensitive hashing (LSH) to optimize the whole computation. The new parallel pattern can be used with any C++ serializable data structure and executed on shared- and distributed-memory machines. We present experimental validations of the proposed solution considering two different clusters and small and large input datasets to evaluate in-core and out-of-core executions. The performance assessment of the SimilarityJoin pattern has been conducted by comparing the execution time against the one obtained from the original hand-tuned Hadoop-based implementation of the LSH-based similarity join algorithms as well as a Spark-based version. The experiments show that the SimilarityJoin pattern: (1) offers a significant performance improvement for small and medium datasets;(2) is competitive also for computations using large input datasets producing out-of-core executions.

关键词： LSH Similarity join high-level parallel programming Distributed programming parallel patterns Big data MapReduce

来源：评论

学校读者我要写书评

暂无评论

SkePU 3: Portable high-level programming of Heterogeneous Systems and HPC Clusters

引用

INTERNATIONAL JOURNAL OF parallel programming 2021年第6期49卷 846-866页

作者： Ernstsson, August Ahlqvist, Johan Zouzoula, Stavroula Kessler, Christoph Linkoping Univ Dept Comp & Informat Sci PELAB Linkoping Sweden

We present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers' memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.

关键词： high-level parallel programming Heterogeneous computing Skeleton programming Co-design approach Cluster computing

来源：评论

学校读者我要写书评

暂无评论

Simultaneous CPU-GPU Execution of Data parallel Algorithmic Skeletons

引用

INTERNATIONAL JOURNAL OF parallel programming 2018年第1期46卷 42-61页

作者： Wrede, Fabian Ernsting, Steffen Leonardo Campus 3 D-48149 Munster Germany

parallel programming has become ubiquitous;however, it is still a low-level and error-prone task, especially when accelerators such as GPUs are used. Thus, algorithmic skeletons have been proposed to provide well-defined programming patterns in order to assist programmers and shield them from low-level aspects. As the complexity of problems, and consequently the need for computing capacity, grows, we have directed our research toward simultaneous CPU-GPU execution of data parallel skeletons to achieve a performance gain. GPUs are optimized with respect to throughput and designed for massively parallel computations. Nevertheless, we analyze whether the additional utilization of the CPU for data parallel skeletons in the Muenster Skeleton Library leads to speedups or causes a reduced performance, because of the smaller computational capacity of CPUs compared to GPUs. We present a C implementation based on a static distribution approach. In order to evaluate the implementation, four different benchmarks, including matrix multiplication, N-body simulation, Frobenius norm, and ray tracing, have been conducted. The ratio of CPU and GPU execution has been varied manually to observe the effects of different distributions. The results show that a speedup can be achieved by distributing the execution among CPUs and GPUs. However, both the results and the optimal distribution highly depend on the available hardware and the specific algorithm.

关键词： high-level parallel programming Data parallel algorithmic skeletons Simultaneous CPU-GPU execution

来源：评论

学校读者我要写书评

暂无评论

A Scalable Farm Skeleton for Hybrid parallel and Distributed programming

引用

INTERNATIONAL JOURNAL OF parallel programming 2014年第6期42卷 968-987页

作者： Ernsting, Steffen Kuchen, Herbert Univ Munster D-48149 Munster Germany

Multi-core processors and clusters of multi-core processors are ubiquitous. They provide scalable performance yet introducing complex and low-level programming models for shared and distributed memory programming. Thus, fully exploiting the potential of shared and distributed memory parallelization can be a tedious and error-prone task: programmers must take care of low-level threading and communication (e.g. message passing) details. In order to assist programmers in developing performant and reliable parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel and distributed programming patterns, thus shielding programmers from low-level aspects of parallel and distributed programming. In this paper we take on the design and implementation of the well-known Farm skeleton. In order to address the hybrid architecture of multi-core clusters we present a two-tier implementation built on top of MPI and OpenMP. On the basis of three benchmark applications, including a simple ray tracer, an interacting particles system, and an application for calculating the Mandelbrot set, we illustrate the advantages of both skeletal programming in general and this two-tier approach in particular.

关键词： high-level parallel programming Algorithmic skeletons Farm skeleton Shared/distributed memory

来源：评论

学校读者我要写书评

暂无评论

Data parallel Algorithmic Skeletons with Accelerator Support

引用

INTERNATIONAL JOURNAL OF parallel programming 2017年第2期45卷 283-299页

作者： Ernsting, Steffen Kuchen, Herbert Univ Munster Leonardo Campus 3 D-48149 Munster Germany

Hardware accelerators such as GPUs or Intel Xeon Phi comprise hundreds or thousands of cores on a single chip and promise to deliver high performance. They are widely used to boost the performance of highly parallel applications. However, because of their diverging architectures programmers are facing diverging programming paradigms. Programmers also have to deal with low-level concepts of parallel programming that make it a cumbersome task. In order to assist programmers in developing parallel applications Algorithmic Skeletons have been proposed. They encapsulate well-defined, frequently recurring parallel programming patterns, thereby shielding programmers from low-level aspects of parallel programming. The main contribution of this paper is a comparison of two skeleton library implementations, one in C++ and one in Java, in terms of library design and programmability. Besides, on the basis of four benchmark applications we evaluate the performance of the presented implementations on two test systems, a GPU cluster and a Xeon Phi system. The two implementations achieve comparable performance with a slight advantage for the C++ implementation. Xeon Phi performance ranges between CPU and GPU performance.

关键词： high-level parallel programming Algorithmic skeletons GPGPU Hardware accelerators

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：