检索结果-内蒙古大学图书馆

A holistic approach for high-level programming of next-generation data-intensive applications targeting distributed heterogeneous computing environment 2nd

A holistic approach for high-level programming of next-gener...

引用

2nd International Conference on Cloud Forward - From Distributed to Complete Computing

作者： Carlini, Emanuele Dazzi, Patrizio Mordacchini, Matteo CNR ISTI Area Ric Pisa I-56124 Pisa Italy CNR IIT Area Ric Pisa I-56124 Pisa Italy

The intrinsic richness and heterogeneity of large amount of data is paired with the extreme complexity in its storing and processing, as well as with the heterogeneity of their processing environments, ranging from super computers to federations of Cloud data-centres. This makes the conception, definition and implementation of software tools for programming applications dealing with very large amount of data really challenging from different perspectives, ranging from technological issues to economic concerns. We propose an approach focused on data-intensive applications that goes beyond the state of the art allowing a seamless exploitation of heterogeneous and distributed resources and satisfying users' needs on data processing providing a dynamically determined set of features, depending on the running environment, the application, the user requirements. (C) 2016 The Authors. Published by Elsevier B.V.

关键词： Cloud Computing Cloud Federation Resource Management Data-intensive Applications high-level programming models

来源：评论

学校读者我要写书评

暂无评论

A Holistic Approach for high-level programming of Next-generation Data-intensive Applications Targeting Distributed Heterogeneous Computing Environment

引用

Procedia Computer Science 2016年 97卷 131-134页

作者： Emanuele Carlini Patrizio Dazzi Matteo Mordacchini CNR-ISTI Area della Ricerca di Pisa 56124 Pisa Italy CNR-IIT Area della Ricerca di Pisa 56124 Pisa Italy

The intrinsic richness and heterogeneity of large amount of data is paired with the extreme complexity in its storing and processing, as well as with the heterogeneity of their processing environments, ranging from super computers to federations of Cloud data-centres. This makes the conception, definition and implementation of software tools for programming applications dealing with very large amount of data really challenging from different perspectives, ranging from technological issues to economic concerns. We propose an approach focused on data-intensive applications that goes beyond the state of the art allowing a seamless exploitation of heterogeneous and distributed resources and satisfying users’ needs on data processing providing a dynamically determined set of features, depending on the running environment, the application, the user requirements.

关键词： Cloud Computing Cloud Federation Resource Management Data-intensive Applications high-level programming models

来源：评论

学校读者我要写书评

暂无评论

Optimizing convolution operations on GPUs using adaptive tiling

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2014年第1期30卷 14-26页

作者： van Werkhovena, Ben Maassen, Jason Bal, Henri E. Seinstra, Frank J. Vrije Univ Amsterdam Dept Comp Sci NL-1081 HV Amsterdam Netherlands Netherlands eSci Ctr NL-1098 XG Amsterdam Netherlands

The research domain of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia data. high-performance computing techniques are necessary to satisfy the ever increasing computational demands of MMCA applications. The introduction of Graphics Processing Units (GPUs) in modern cluster systems presents application developers with a challenge. While GPUs are well known to be capable of providing significant performance improvements, the programming complexity vastly increases. To this end, we have extended a user transparent parallel programming model for MMCA, named Parallel-Horus, to allow the execution of compute intensive operations on the GPUs present in the cluster. The most important class of operations in the MMCA domain are convolutions, which are typically responsible for a large fraction of the execution time. Existing optimization approaches for CUDA kernels in general as well as those specific to convolution operations are too limited in both performance and flexibility. In this paper, we present a new optimization approach, called adaptive tiling, to implement a highly efficient, yet flexible, library-based convolution operation for modern GPUs. To the best of our knowledge, our implementation is the most optimized and best performing implementation of 2D convolution in the spatial domain available to date. (C) 2013 Elsevier B.V. All rights reserved.

关键词： high-performance computing GPU computing Parallel applications GPU clusters high-level programming models

来源：评论

学校读者我要写书评

暂无评论

An Efficient Scalable Runtime System for Macro Data Flow Processing Using S-NET

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2014年第6期42卷 988-1011页

作者： Gijsbers, Bert Grelck, Clemens Univ Amsterdam Inst Informat Amsterdam Netherlands

S-Net is a declarative coordination language and component technology aimed at radically facilitating software engineering for modern parallel compute systems by near-complete separation of concerns between application (component) engineering and concurrency orchestration. S-Net builds on the concept of stream processing to structure networks of communicating asynchronous components implemented in a conventional (sequential) language. In this paper we present the design, implementation and evaluation of a new and innovative runtime system for S-Net streaming networks. The Front runtime system outperforms the existing implementations of S-Net by orders of magnitude for stress-test benchmarks, significantly reduces runtimes of fully-fledged parallel applications with compute-intensive components and achieves good scalability on our 48-core test system.

关键词： high-level programming models Declarative parallel programming languages and libraries: semantics and implementation

来源：评论

学校读者我要写书评

暂无评论

Introducing and Implementing the Allpairs Skeleton for programming Multi-GPU Systems

引用

INTERNATIONAL JOURNAL OF PARALLEL programming 2014年第4期42卷 601-618页

作者： Steuwer, Michel Friese, Malte Albers, Sebastian Gorlatch, Sergei Univ Munster Dept Math & Comp Sci D-48149 Munster Germany

Algorithmic skeletons simplify software development: they abstract typical patterns of parallelism and provide their efficient implementations, allowing the application developer to focus on the structure of algorithms, rather than on implementation details. This becomes especially important for modern parallel systems with multiple graphics processing units (GPUs) whose programming is complex and error-prone, because state-of-the-art programming approaches like CUDA and OpenCL lack high-level abstractions. We define a new algorithmic skeleton for allpairs computations which occur in real-world applications, ranging from bioinformatics to physics. We develop the skeleton's generic parallel implementation for multi-GPU Systems in OpenCL. To enable the automatic use of the fast GPU memory, we identify and implement an optimized version of the allpairs skeleton with a customizing function that follows a certain memory access pattern. We use matrix multiplication as an application study for the allpairs skeleton and its two implementations and demonstrate that the skeleton greatly simplifies programming, saving up to 90 % of lines of code as compared to OpenCL. The performance of our optimized implementation is up to 6.8 times higher as compared with the generic implementation and is competitive to the performance of a manually written optimized OpenCL code.

关键词： high-level programming models Algorithmic skeletons GPU computing Allpairs computation SkelCL

来源：评论

学校读者我要写书评

暂无评论

A HLS-based toolflow to design next-generation heterogeneous many-core platforms with shared memory 12

A HLS-based toolflow to design next-generation heterogeneous...

引用

12th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC)

作者： Burgio, Paolo Marongiu, Andrea Coussy, Philippe Benini, Luca Univ Bretagne Sud LabSTICC Lorient France Univ Bologna DEI I-40126 Bologna Italy Swiss Fed Inst Technol Integrated Syst Lab Zurich Switzerland

ISBN: (纸本)9780769552491

This work describes how we use high-level Synthesis to support design space exploration (DSE) of heterogeneous many-core systems. Modern embedded systems increasingly couple hardware accelerators and processing cores on the same chip, to trade specialization of the platform to an application domain for increased performance and energy efficiency. However, the process of designing such a platform is complex and error-prone, and requires skills on algorithmic aspects, ardware synthesis, and software engineering. DSE can partially be automated, and thus simplified, by coupling the use of HLS tools and virtual prototyping platforms. In this paper we enable the design space exploration of heterogeneous many-cores adopting a shared-memory architecture template, where communication and synchronization between the hardware accelerators and the cores happens through L1 shared memory. This communication infrastructure leverages a "zero-copy" scheme, which simplifies both the design process of the platform and the development of applications on top of it. Moreover, the shared-memory template perfectly fits the semantics of several high-level programming models, such as OpenMP. We provide programmers with simple yet powerful abstractions to exploit accelerators from within an OpenMP application, and propose a low-cost implementation of the necessary runtime support. An HLS-based automatic design flow is set up, to quickly explore the design space using a cycle-accurate virtual platform.

关键词： embedded systems parallel programming shared memory systems DSE HLS-based toolflow L1 shared memory OpenMP application cycle-accurate virtual platform design space exploration high-level programming models high-level synthesis modern embedded systems next-generation heterogeneous many-core platforms shared-memory architecture template Acceleration Computer architecture Hardware Program processors programming Registers Synchronization OpenMP clustered architectures heterogeneous architectures hls many-core systems shared-memory systems Embedded systems shared memory systems high level synthesis Space Exploration DSE gene Program processors Computer Architecture Registers Parallel programming shared memory Computer hardware programming hardware accelerator Platform acceleration

来源：评论

学校读者我要写书评

暂无评论

Trasgo: a nested-parallel programming system

引用

JOURNAL OF SUPERCOMPUTING 2011年第2期58卷 226-234页

作者： Gonzalez-Escribano, Arturo Llanos, Diego R. Univ Valladolid Dept Informat Valladolid Spain

programming models of pure nested-parallelism are appealing due to their ease of programming and good analysis and debugging properties. Although their simple synchronization structure is appropriate to represent abstract parallel algorithms, it does not take into account many implementation issues. In this work we present Trasgo, a programming system based on high-level, nested-parallel specifications. We show how it allows to easily express complex combinations of data and task parallelism with a common scheme, hiding the layout and scheduling details. The approach allows the development of a modular compiler where automatic transformation techniques may exploit lower level and more complex synchronization structures, unlocking the limitations of pure nested-parallel programming. This article presents an overview of the features of Trasgo, and its architecture. We present some performance results using well-known parallel algorithms, and a roadmap of improvements and new features to be added to Trasgo.

关键词： high-level programming models Parallel compilers

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：