检索结果-内蒙古大学图书馆

parallel programming models FOR HETEROGENEOUS MULTICORE ARCHITECTURES

IEEE MICRO 2010年第5期30卷 42-53页

作者： Ferrer, Roger Bellens, Pieter Beltran, Vicenc Gonzalez, Marc Martorell, Xavier Badia, Rosa M. Ayguade, Eduard Yeom, Jae-Seung Schneider, Scott Koukos, Konstantinos Alvanos, Michail Nikolopoulos, Dimitrios S. Bilas, Angelos Barcelona Supercomp Ctr Dept Comp Sci Barcelona 08034 Spain Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA

This article evaluates the scalability and productivity of six parallel programming models for heterogeneous architectures, and finds that task-based models using code and data annotations require the minimum programming effort while sustaining nearly best performance. However, achieving this result requires both extensions of programming models to control locality and granularity and proper interoperability with platform-specific optimizations.

关键词： Microprocessor Chips Open Systems parallel Architectures parallel programming parallel programming models Heterogeneous Multicore Architectures Task Based models Code Data Annotations Interoperability Platform Specific Optimizations Microprocessors Multicore Processing Program Processors programming Data models Computational Modeling Concurrent programming Environments For Multiprocessor Systems Hardware Software Interfaces Heterogeneous Hybrid Systems

来源：评论

学校读者我要写书评

暂无评论

parallel programming models for heterogeneous many-cores: a comprehensive survey

引用

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING 2020年第4期2卷 382-400页

作者： Fang, Jianbin Huang, Chun Tang, Tao Wang, Zheng Natl Univ Def Technol Coll Comp Inst Comp Syst Changsha Peoples R China Univ Leeds Sch Comp Leeds W Yorkshire England

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform. In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability. We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices. We provide a road map for a wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements.

关键词： Heterogeneous computing Many-core architectures parallel programming models

来源：评论

学校读者我要写书评

暂无评论

Exploring Traditional and Emerging parallel programming models using a Proxy Application

Exploring Traditional and Emerging Parallel Programming Mode...

引用

IEEE 27th International parallel and Distributed Processing Symposium (IPDPS)

作者： Karlin, Ian Bhatele, Abhinav Keasler, Jeff Chamberlain, Bradford L. Cohen, Jonathan DeVito, Zachary Haque, Riyaz Laney, Dan Luke, Edward Wang, Felix Richards, David Schulz, Martin Still, Charles H. Lawrence Livermore Natl Lab POB 808 Livermore CA 94551 USA Cray Res Inc Washington DC 98164 USA Stanford Univ Stanford CA 94305 USA Univ Calif Los Angeles Los Angeles CA 90095 USA Mississippi State Univ Mississippi State MS 39762 USA Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9780769549712

parallel machines are becoming more complex with increasing core counts and more heterogeneous architectures. However, the commonly used parallel programming models, C/C++ with MPI and/or OpenMP, make it difficult to write source code that is easily tuned for many targets. Newer language approaches attempt to ease this burden by providing optimization features such as automatic load balancing, overlap of computation and communication, message-driven execution, and implicit data layout optimizations. In this paper, we compare several implementations of LULESH, a proxy application for shock hydrodynamics, to determine strengths and weaknesses of different programming models for parallel computation. We focus on four traditional (OpenMP, MPI, MPI+ OpenMP, CUDA) and four emerging (Chapel, Charm++, Liszt, Loci) programming models. In evaluating these models, we focus on programmer productivity, performance and ease of applying optimizations.

关键词： parallel programming models productivity performance co-design proxy application

来源：评论

学校读者我要写书评

暂无评论

Optimization Techniques for GPU-Based parallel programming models in High-Performance Computing

引用

信息工程期刊（中英文版） 2024年第1期12卷 7-11页

作者： Shuntao Tang Wei Chen Xihua University

This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data *** optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical *** integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and *** findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC *** paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.

关键词： Optimization Techniques GPU-Based parallel programming models High-Performance Computing

来源：评论

学校读者我要写书评

暂无评论

Middleware infrastructure for parallel and distributed programming models in heterogeneous systems

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2003年第11期14卷 1100-1111页

作者： Al-Jaroodi, J Mohamed, N Jiang, H Swanson, D Univ Nebraska Dept Comp Sci & Engn Lincoln NE 68588 USA

In this paper, we introduce a middleware infrastructure that provides software services for developing and deploying high-performance parallel programming models and distributed applications on clusters and networked heterogeneous systems. This middleware infrastructure utilizes distributed agents residing on the participating machines and communicating with one another to perform the required functions. An intensive study of the parallel programming models in Java has helped identify the common requirements for a runtime support environment, which we used to define the middleware functionality. A Java-based prototype, based on this architecture, has been developed along with a Java Object-Passing Interface (JOPI) class library. Since this system is written completely in Java, it is portable and allows executing programs in parallel across multiple heterogeneous platforms. With the middleware infrastructure, users need not deal with the mechanisms of deploying and loading user classes on the heterogeneous system. Moreover, details of scheduling, controlling, monitoring, and executing user jobs are hidden, while the management of system resources is made transparent to the user. Such uniform services are essential for facilitating the development and deployment of scalable high-performance Java applications on clusters and heterogeneous systems. An initial deployment of a parallel Java programming model over a heterogeneous, distributed system shows good performance results. In addition, a framework for the agents' startup mechanism and organization is introduced to provide scalable deployment and communication among the agents.

关键词： distributed systems middleware parallel programming models parallel and distributed Java cluster heterogeneous systems distributed agents

来源：评论

学校读者我要写书评

暂无评论

An empirical performance evaluation of SYCL on ARM multi-core processors

引用

CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING 2025年第1期7卷 1-16页

作者： Liang, Hanzheng Deng, Chencheng Zhang, Peng Fang, Jianbin Tang, Tao Huang, Chun Natl Univ Def Technol Coll Comp Sci & Technol Changsha 410073 Peoples R China

SYCL is a modern royalty-free heterogeneous programming specification maintained by the Khronos Group. Recently, it has become increasingly more prevalent and matured, leading to various assessments of its performance, portability, and programmability. While previous evaluations have mainly focused on X86 CPUs, NVIDIA GPUs, and AMD GPUs, how well SYCL performs on ARM multi-core CPUs is still unknown. In this paper, we evaluate three SYCL implementations (i.e., DPCPP, AdaptiveCPP, and MLIR-SYCL) on ARM multi-core CPUs, to uncover performance traps and offer optimization techniques. We use the SYCL-Bench benchmark suite to assess the performance of DPCPP, AdaptiveCPP, and MLIR-SYCL against their OpenMP counterparts. We also assess the compiler and runtime overhead to evaluate the usability and productivity of the SYCL implementations. Our empirical results demonstrate that these SYCL implementations can achieve satisfactory performance on ARM multi-core processors. Additionally, we highlight several key optimizations, such as NUMA management, which must be carefully addressed to enhance performance.

关键词： parallel programming models SYCL ARM CPUs Performance evaluation

来源：评论

学校读者我要写书评

暂无评论

A TEMPLATE-BASED APPROACH TO THE GENERATION OF DISTRIBUTED APPLICATIONS USING A NETWORK OF WORKSTATIONS

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 1991年第1期2卷 52-67页

作者： SINGH, A SCHAEFFER, J GREEN, M Distributed Systems Research Laboratory Department of Computing Science University of Alberta Edmonton AB Canada

Despite rapid growth in workstation and networking technologies, the workstation environment continues to pose challenging problems to shared processing. In this paper, we present a computational model and system for the generation of distributed applications in such an environment. The well-known RPC model is modified by a novel concept known as template attachment. A computation consists of a network of sequential procedures which have been encapsulated in templates. A small selection of templates is available from which a distributed application with the desired communication behavior can be rapidly built. The system generates all the required low-level code for correct synchronization, communication, and scheduling. This results in a system that is easy to use and flexible, and can provide a programmer with the desired amount of control in using idle processing power over a network of workstations. The practical feasibility of the model has been demonstrated by implementing it for Unix1-based workstation environments.

关键词： COARSE GRAIN CONCURRENCY DISTRIBUTED COMPUTING DISTRIBUTED SOFTWARE ENGINEERING NETWORK SYSTEMS parallel programming parallel programming models WORKSTATION ENVIRONMENT

来源：评论

学校读者我要写书评

暂无评论

HitFlow: A Dataflow programming Model for Hybrid Distributed- and Shared-Memory Systems

引用

INTERNATIONAL JOURNAL OF parallel programming 2019年第1期47卷 3-23页

作者： Fresno, Javier Barba, Daniel Gonzalez-Escribano, Arturo Llanos, Diego R. Univ Valladolid Dept Informat Valladolid Spain

Dataflow programming consists in developing a program by describing its sequential stages and the interactions between them. The runtime systems supporting this kind of programming are responsible for exploiting the parallelism by concurrently executing the different stages as soon as their dependencies are met. In this paper we introduce a new parallel programming model and framework based on the dataflow paradigm. It presents a new combination of features that allows to easily map programs to shared or distributed memory, exploiting data locality and affinity to obtain the same performance than optimized coarse-grain MPI programs. These features include: It is a unique one-tier model that supports hybrid shared- and distributed-memory systems with the same abstractions;it can express activities arbitrarily linked, including non-nested cycles;it uses internally a distributed work-stealing mechanism to allow Multiple-Producer/Multiple-Consumer configurations;and it has a runtime mechanism for the reconfiguration of the dependences and communication channels which also allows the creation of task-to-task data affinities. We present an evaluation using examples of different classes of applications. Experimental results show that programs generated using this framework deliver good performance in hybrid distributed- and shared-memory environments, with a similar development effort as other dataflow programming models oriented to shared-memory.

关键词： Distributed systems Dynamic computation parallel programming models Streaming computation

来源：评论

学校读者我要写书评

暂无评论

pMATLAB parallel MATLAB library

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2007年第3期21卷 336-359页

作者： Bliss, N. Travinin Kepner, J. MIT Lincoln Lab Lexington MA 02420 USA

MATLAB (R) has emerged as one of the languages most commonly used by scientists and engineers for technical computing, with approximately one million users worldwide. The primary benefits Of MATLAB are reduced code development time via high levels of abstractions (e.g. first class multi-dimensional arrays and thousands of built in functions), interpretive, interactive programming, and powerful mathematical graphics. The compute intensive nature of technical computing means that many MATLAB users have codes that can significantly benefit from the increased performance offered by parallel computing. plMatlab provides this capability by implementing parallel global array semantics using standard operator overloading techniques. The core data structure in pMatlab is a distributed numerical array whose distribution onto multiple processors is specified with a "map" construct. Communication operations between distributed arrays are abstracted away from the user and plMatlab transparently supports redistribution between any block-cyclic-overlapped distributions up to four dimensions. pMatlab is built on top of the MatlabMPI communication library and runs on any combination of heterogeneous systems that support MATLAB, which includes Windows, Linux, MacOS X, and SunOS. This paper describes the overall design and architecture of the pMatlab implementation. Performance is validated by implementing the HPC Challenge benchmark suite and comparing plMatlab performance with the equivalent C+MPI codes. These results indicate that plMatlab can often achieve comparable performance to C+MPI, usually at one tenth the code size. Finally, we present implementation data collected from a sample of real pMatlab applications drawn from the approximately one hundred users at MIT Lincoln Laboratory. These data indicate that users are typically able to go from a serial code to an efficient pMatlab code in about 3 hours while changing less than 1% of their code.

关键词： parallel computing parallel programming models parallel MATLAB HPC challenge

来源：评论

学校读者我要写书评

暂无评论

models and languages for parallel computation

引用

ACM COMPUTING SURVEYS 1998年第2期30卷 123-169页

作者： Skillicorn, DB Talia, D Queens Univ Kingston ON K7L 3N6 Canada Univ Calabria DEIS CNR ISI I-87036 Arcavacata Di Rende CS Italy

We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should be easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs. These criteria reflect our belief that developments in parallelism must be driven by a parallel software industry based on portability and efficiency. We consider programming models in six categories, depending on the level of abstraction they provide. Those that are very abstract conceal even the presence of parallelism at the software level. Such models make software easy to build and port, but efficient and predictable performance is usually hard to achieve. At the other end of the spectrum, low-level models make all of the messy issues of parallel programming explicit (how many threads, how to place them, how to express communication, and how to schedule communication), so that software is hard to build and not very portable, but is usually efficient. Most recent models are near the center of this spectrum, exploring the best tradeoffs between expressiveness and performance. A few models have achieved both abstractness and efficiency. Both kinds of models raise the possibility of parallelism as part of the mainstream of computing.

关键词： general-purpose parallel computation logic programming languages object-oriented languages parallel programming languages parallel programming models software development methods taxonomy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：