检索结果-内蒙古大学图书馆

Controllers: An abstraction to ease the use of hardware accelerators

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2018年第6期32卷 838-853页

作者： Moreton-Fernandez, Ana Ortega-Arranz, Hector Gonzalez-Escribano, Arturo Univ Valladolid Valladolid Spain

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.

关键词： parallel programming GPUs CUDA Heterogeneous programming

来源：评论

学校读者我要写书评

暂无评论

Teaching distributed memory programming from mental models

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2018年 118卷 120-127页

作者： Eijkhout, Victor Univ Texas Austin Texas Adv Comp Ctr 10100 Burnet Rd Austin TX 78758 USA

Distributed memory programming, typically through the MPI library, is the de facto standard for programming large scale parallelism, with up to millions of individual processes. Its dominant paradigm of Single Program Multiple Data (SPMD) programming is different from threaded and multicore parallelism, to an extent that students have a hard time switching models. In contrast to threaded programming, which allows for a view of the execution with central control and a central repository of data, SPMD programming has a symmetric model where all processes are active all the time, with none privileged, and where data is distributed. This model is counterintuitive to the novice parallel programmer, so care needs to be taken how to instill the proper 'mental model'. Adoption of an incorrect mental model leads to broken or inefficient code. We identify problems with the currently common way of teaching MPI, and propose a structuring of MPI courses that is geared to explicit reinforcing the symmetric model. Additionally, we advocate starting from realistic scenarios, rather than writing artificial code just to exercise newly-learned routines. (C) 2018 Published by Elsevier Inc.

关键词： parallel programming Teaching parallel programming Distributed memory Mental models

来源：评论

学校读者我要写书评

暂无评论

parallelization of large vector similarity computations in a hybrid CPU plus GPU environment

引用

JOURNAL OF SUPERCOMPUTING 2018年第2期74卷 768-786页

作者： Czarnul, Pawe Gdansk Univ Technol Fac Elect Telecommun & Informat Dept Comp Architecture Gdansk Poland

The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector pairs: tuning of a GPU kernel with consideration of memory coalescing and using shared memory, minimization of GPU memory allocation costs, optimization of CPU-GPU communication in terms of size of data sent, overlapping CPU-GPU communication and kernel execution, concurrent kernel execution, determination of best sizes for data batches processed on CPUs and GPUs along with best GPU grid sizes. It is shown that all codes scale in hybrid environments with various relative performances of compute devices, even for a case when comparisons of various vector pairs take various amounts of time. Tests were performed on two high-performance hybrid systems with: 2 x Intel Xeon E5-2640 CPU + 2 x NVIDIA Tesla K20m and latest generation 2 x Intel Xeon CPU E5-2620 v4 + NVIDIA's Pascal generation GTX 1070 cards. Results demonstrate expected improvements and beneficial optimizations important for users incorporating such types of computations into their parallel codes run on similar systems.

关键词： Hybrid parallelism OpenMP CUDA parallel programming Optimization

来源：评论

学校读者我要写书评

暂无评论

A parallel Processing Model for Accelerating High-Resolution Geo-Spatial Accessibility Analysis

引用

IEEE ACCESS 2018年 6卷 52936-52952页

作者： Ma, Mengyu Wu, Ye Guo, Ning Chen, Luo Gong, Qi Li, Jun Natl Univ Def Technol Coll Elect Sci Changsha 410073 Hunan Peoples R China

Accessibility is an important issue in transport geography, land planning, and many other related fields. Accessibility problems become computationally demanding when involving high-resolution requirements. Using conventional methods, providing high-resolution accessibility analysis for real-time decision support remains a challenge. In this paper, we present a parallel processing model, named HiAccess, to solve the high-resolution accessibility analysis problems in real time. One feature of HiAccess is a fast road network construction method, in which the road network topology is determined by traversing the original road nodes only once. The parallel strategies of HiAccess are fully optimized with few repeated computations. Moreover, a simple, efficient, and highly effective map generalization method is proposed to reduce computation load without an accuracy loss. The flexibility of HiAccess enables it to work well when applied to different accessibility analysis models. To further demonstrate the applicability of HiAccess, a case study of settlement sites selection for poverty alleviation in Xiangxi, Central China, is carried out. The accessibility of jobs, health care, educational resources, and other public facilities are comprehensively analyzed for settlement sites selection. HiAccess demonstrates the striking performance of measuring high-resolution (using 100 m x 100 m grids) accessibility of a city (in total over 250k grids, roads with 232k segments, and 40 facilities) in 1 sec without preprocessing, while ArcGIS takes nearly 1 h to achieve a less satisfactory result. In additional experiments, HiAccess is tested on much larger data sets with excellent performance.

关键词： Geo-computation geo-spatial accessibility parallel programming road network construction map generalization

来源：评论

学校读者我要写书评

暂无评论

Formalised Composition and Interaction for Heterogeneous Structured parallelism

引用

INTERNATIONAL JOURNAL OF parallel programming 2018年第1期46卷 120-151页

作者： Goli, Mehdi Gonzalez-Velez, Horacio Robert Gordon Univ Aberdeen Scotland Natl Coll Ireland Dublin Ireland

Deployed through skeleton frameworks, structured parallelism yields a clear and consistent structure across platforms by distinctly decoupling computations from the structure in a parallel programme. Structured programming is a viable and effective means of providing the separation of concerns, as it subdivides a system into building blocks (modules, skids or components) that can be independently created, and then used in different systems to drive multiple functionalities. Depending on its defined semantic, each building block wraps a unit of computing function, where the valid assembly of these building blocks forms a high-level structural parallel programming model. This paper proposes a grammar to build block components to execute computational functions in heterogeneous multi-core architectures. The grammar is validated against three different families of computing models: skeleton-based, general purpose, and domain-specific. In conjunction with the protocol, the grammar produces fully instrumented code for an application suite using the skeletal framework FastFlow.

关键词： Structured parallelism Algorithmic skeletons Multi-core architectures parallel programming Language constructs and features Concurrent programming structures

来源：评论

学校读者我要写书评

暂无评论

The Ongoing Evolution of OpenMP

引用

PROCEEDINGS OF THE IEEE 2018年第11期106卷 2004-2019页

作者： de Supinski, Bronis R. Scogland, Thomas R. W. Duran, Alejandro Klemm, Michael Bellido, Sergi Mateo Olivier, Stephen L. Terboven, Christian Mattson, Timothy G. Lawrence Livermore Natl Lab Livermore Comp Livermore CA 94551 USA Intel Corp Iberia Madrid 28020 Spain Intel Deutschland GmbH D-85622 Feldkirchen Germany Barcelona Supercomp Ctr Barcelona 08034 Spain Sandia Natl Labs POB 5800 Albuquerque NM 87185 USA Rhein Westfal TH Aachen D-52074 Aachen Germany Intel Hillsboro OR 97124 USA

This paper presents an overview of the past, present and future of the OpenMP application programming interface (API). While the API originally specified a small set of directives that guided shared memory fork-join parallelization of loops and program sections, OpenMP now provides a richer set of directives that capture a wide range of parallelization strategies that are not strictly limited to shared memory. As we look toward the future of OpenMP, we immediately see further evolution of the support for that range of parallelization strategies and the addition of direct support for debugging and performance analysis tools. Looking beyond the next major release of the specification of the OpenMP API, we expect the specification eventually to include support for more parallelization strategies and to embrace closer integration into its Fortran, C and, in particular, C++ base languages, which will likely require the API to adopt additional programming abstractions.

关键词： Accelerator architectures computer architecture computer science computers and information processing memory management multicore processing multithreading parallel architectures parallel processing parallel programming programming

来源：评论

学校读者我要写书评

暂无评论

Multilevel parallelism for the Exploration of Large-Scale Graphs

引用

IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS 2018年第3期4卷 204-216页

作者： Bernaschi, Massimo Bisson, Mauro Mastrostefano, Enrico Vella, Flavio NVIDIA Corp Santa Clara CA 95051 USA Natl Res Council Italy Ist Applicaz Calcolo I-00185 Rome Italy

We present the most recent release of our parallel implementation of the BFS and BC algorithms for the study of large scale graphs. Although our reference platformis a high-end cluster of new generation Nvidia GPUs and some of our optimizations are CUDA specific, most of our ideas can be applied to other platforms offering multiple levels of parallelism. We exploit multi level parallel processing through a hybrid programming paradigm that combines highly tuned CUDA kernels, for the computations performed by each node, and explicit data exchange through the Message Passing Interface (MPI), for the communications among nodes. The results of the numerical experiments show that the performance of our code is comparable or better with respect to other state-of-the-art solutions. For the BFS, for instance, we reach a peak performance of 200 Giga Teps on a single GPU and 5.5 Terateps on 1024 Pascal GPUs. We release our source codes both for reproducing the results and for facilitating their usage as a building block for the implementation of other algorithms.

关键词： Large graphs graph algorithms parallel algorithms parallel programming distributed programming GPU CUDA

来源：评论

学校读者我要写书评

暂无评论

Development of an axisymmetric parallel solution algorithm for membrane separation process

引用

DESALINATION 2019年 471卷

作者： Lo, S. B. Jones, J. W. Hassan, O. Hilal, N. Swansea Univ Coll Engn Swansea SA1 8EN W Glam Wales

A novel parallel technique that couples the lattice-Boltzmann method and a finite volume scheme for the prediction of concentration polarisation and pore blocking in axisymmetric cross-flow membrane separation process is presented. The model uses the Lattice-Boltzmann method to solve the incompressible Navier-Stokes equations for hydrodynamics and the finite volume method to solve the convection-diffusion equation for solute particles. Concentration polarisation is modelled for micro-particles by having the diffusion coefficient defined as a function of particle concentration and shear rate. The model considers the effect of an incompressible cake formation. Pore blocking phenomenon is predicted for filtration membrane fouling by using the rate of particles arriving at the membrane surface. The simulation code is parallelised in two ways. Compute Unified Device Architecture (CUDA) is used for a cluster of graphical processing units (GPUs) and Message Passing Interface (MPI) is utilised for a cluster of central processing units (CPUs), with various parallelisation techniques to optimise memory usage for higher performance. The proposed model is validated by comparing to analytical solutions and experimental result.

关键词： Filtration Concentration polarisation Cake formation Pore blocking parallel programming

来源：评论

学校读者我要写书评

暂无评论

HDM: A Composable Framework for Big Data Processing

引用

IEEE TRANSACTIONS ON BIG DATA 2018年第2期4卷 150-163页

作者： Wu, Dongyao Zhu, Liming Lu, Qinghua Sakr, Sherif CSIRO Data61 Sydney NSW 1670 Australia Univ New South Wales Sch Comp Sci & Engn Sydney NSW 2052 Australia China Univ Petr Coll Comp & Commun Engn Qingdao 266580 Peoples R China King Saud bin Abdulaziz Univ Hlth Sci Natl Guard Riyadh 14611 Saudi Arabia

Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides, it also hampers the ability for applying optimizations on the data flow of job sequences and pipelines. In this paper, we present the Hierarchically Distributed Data Matrix (HDM) which is a functional, strongly-typed data representation for writing composable big data applications. Along with HDM, a runtime framework is provided to support the execution, integration and management of HDM applications on distributed infrastructures. Based on the functional data dependency graph of HDM, multiple optimizations are applied to improve the performance of executing HDM jobs. The experimental results show that our optimizations can achieve improvements between 10 to 40 percent of the Job-Completion-Time for different types of applications when compared with the current state of art, Apache Spark.

关键词： Big data processing parallel programming functional programming distributed systems system architecture

来源：评论

学校读者我要写书评

暂无评论

GPU-accelerated generic analytic simulation and image reconstruction platform for multi-pinhole SPECT systems 15

GPU-accelerated generic analytic simulation and image recons...

引用

15th International Meeting on Fully Three-Dimensional Image Reconstruction in Radiology and Nuclear Medicine

作者： Zeraatkar, Navid Auer, Benjamin Kalluri, Kesava Furenlid, Lars R. Kuo, Philip H. King, Michael A. Univ Massachusetts Dept Radiol Med Sch Worcester MA 01655 USA Univ Arizona Dept Med Imaging Tucson AZ 85719 USA

ISBN: (数字)9781510628380

ISBN: (纸本)9781510628380

We introduce a generic analytic simulation and image reconstruction software platform for multi-pinhole (MPH) SPECT systems. The platform is capable of modeling common or sophisticated MPH designs as well as complex data acquisition schemes. Graphics processing unit (GPU) acceleration was utilized to make a high-performance computing software. Herein, we describe the software platform and provide verification studies of the simulation and image reconstruction software.

关键词： analytic simulation image reconstruction pinhole multi-pinhole SPECT GPU parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：