检索结果-内蒙古大学图书馆

6th International Symposium on parallel Architectures, Algorithms, and programming (PAAP)

作者： Xu Jiangwei Wei Zhenqi Liu Peilin Shanghai Jiao Tong Univ Sch Elect Informat & Elect Engn Shanghai 200030 Peoples R China

ISBN: (纸本)9781479938445

In recent years, multi-core digital signal processors (DSPs) have been widely used to improve execution efficiency in a variety of applications. In order to fully explore the parallel processing capacity of DSPs, a well-designed parallel programming model is essential for programmers. In this paper, a parallel programming model for a self-designed multi-core audio DSP (MAD) is proposed based on both shared-memory and message-passing communication mechanisms. A set of application program interfaces (APIs) of PPMA are provided to realize inter-core data transmission and synchronization controlling with high efficiency. To evaluate performance improvement of audio applications using PPMA, a low bit-rate speech codec application is ported to the MAD. With the help of PPMA, task scheduling of speech codec can be implemented conveniently. Experimental results also show that the overhead of inter-core communication in MAD is negligible compared to the parallel speedup achieved by PPMA.

关键词： parallel programming model Multi-Core DSP Shared Memory Message Passing Low Bit-Rate Speech Codec

来源：评论

学校读者我要写书评

暂无评论

Study of parallel programming models on computer clusters with Intel MIC coprocessors

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2017年第4期31卷 303-315页

作者： Huang, Miaoqing Lai, Chenggang Shi, Xuan Hao, Zhijun You, Haihang Univ Arkansas JBHT CSCE 5261 Univ Arkansas Fayetteville AR 72701 USA Fudan Univ Shanghai Peoples R China Chinese Acad Sci Inst Comp Technol Beijing Peoples R China

Coprocessors based on the Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Our findings are as follows. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

关键词： parallel programming model Intel MIC processor MPI OpenMP performance evaluation

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of the parallel CUPID Code for Various parallel programming models in Symmetric Multi-Processing System

引用

TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS B 2014年第1期38卷 71-79页

作者： Jeon, Byoung Jin Lee, Jae Ryong Yoon, Han Young Choi, Hyoung Gwon Seoul Natl Univ Sci & Technol Grad Sch Energy & Environm Dept Energy Syst Seoul South Korea Korea Atom Energy Res Inst Thermal Hydraul Safety Res Div Daejeon South Korea Seoul Natl Univ Sci & Technol Dept Mech Automot Engn Seoul South Korea

A parallelization of the bi-conjugate gradient solver for the pressure equation of the CUPID (component unstructured program for interfacial dynamics) code, which was developed for analyzing the components of a pressurized water-cooled reactor, was studied in a symmetric multi-processing system. The parallel performance was investigated for three typical parallel programming models (MPI, OpenMP, Hybrid) by solving incompressible backward-facing step flow at various grid resolutions. It was confirmed that parallel performance was low when problem size was small or the memory requirement for each thread was considerably higher than the cache memory. Furthermore, it was shown that MPI was better than OpenMP regardless of the problem size, and Hybrid was the best when the number of threads was relatively small.

关键词： Symmetric Multi-Processing Bi-Conjugate Gradient parallel programming model CUPID

来源：评论

学校读者我要写书评

暂无评论

Analysis of parallel Algorithms on SMP Node and Cluster of Workstations Using parallel programming models with New Tile-based Method for Large Biological Datasets

引用

BIOINFORMATICS AND BIOLOGY INSIGHTS 2016年第10期10卷 255-265页

作者： Shrimankar, D. D. Sathe, S. R. Visvesvaraya Natl Inst Technol Dept Comp Sci & Engn Nagpur Maharashtra India

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

关键词： parallel algorithm parallel programming model biosequences SMP cluster

来源：评论

学校读者我要写书评

暂无评论

Comparison of parallel programming models on Intel MIC Computer Cluster 28

Comparison of Parallel Programming Models on Intel MIC Compu...

引用

28th IEEE International parallel & Distributed Processing Symposium Workshops (IPDPSW)

作者： Lai, Chenggang Hao, Zhijun Huang, Miaoqing Shi, Xuan You, Haihang Univ Arkansas Fayetteville AR 72701 USA Fudan Univ Shanghai Peoples R China Univ Tennessee Knoxville TN 37996 USA

ISBN: (纸本)9781479941162

Coprocessors based on Intel Many Integrated Core (MIC) Architecture have been adopted in many high-performance computer clusters. Typical parallel programming models, such as MPI and OpenMP, are supported on MIC processors to achieve the parallelism. In this work, we conduct a detailed study on the performance and scalability of the MIC processors under different programming models using the Beacon computer cluster. Followings are our findings. (1) The native MPI programming model on the MIC processors is typically better than the offload programming model, which offloads the workload to MIC cores using OpenMP, on Beacon computer cluster. (2) On top of the native MPI programming model, multithreading inside each MPI process can further improve the performance for parallel applications on computer clusters with MIC coprocessors. (3) Given a fixed number of MPI processes, it is a good strategy to schedule these MPI processes to as few MIC processors as possible to reduce the cross-processor communication overhead. (4) The hybrid MPI programming model, in which data processing is distributed to both MIC cores and CPU cores, can outperform the native MPI programming model.

关键词： parallel programming model Intel MIC processor MPI OpenMP performance evaluation

来源：评论

学校读者我要写书评

暂无评论

Optimization Strategies Oriented to Loop Characteristics in Software Thread Level Speculation Systems

引用

Journal of Computer Science & Technology 2016年第1期31卷 60-76页

作者： Li Shen State Key Laboratory of High Performance Computing Changsha 410073 China School of Computer National University of Defense Technology Changsha 410073 China

Thread level speculation provides not only a simple parallel programming model, but also an effective mech- anism for thread-level parallelism exploitation. The performance of software speculative parallel models is limited by high global overheads caused by different types of loops. These loops usually have different characteristics of dependencies and different requirements of optimization strategies. In this paper, we propose three comprehensive optimization techniques to reduce different factors of global overheads, aiming at requirements from different types of loops. Inter-thread fetching can reduce the high mis-speculation rate of the loops with frequent dependencies and out-of-order committing can reduce the control overhead of the loops with infrequent dependencies, while enhanced dynamic task granularity resizing can reduce the control overhead and optimize the global overhead of the loops with changing characteristics of dependencies. All these three optimization techniques have been implemented in HEUSPEC~ a software TLS system. Experimental results indicate that they can satisfy tile demands from different groups of benchmarks. The combination of these techniques can improve the performance of all benchmarks and reach a higher average speedup.

关键词： parallel programming model optimization thread level speculation HEUSPEC performance

来源：评论

学校读者我要写书评

暂无评论

COMPSs-Mobile: parallel programming for Mobile Cloud Computing

引用

JOURNAL OF GRID COMPUTING 2017年第3期15卷 357-378页

作者： Lordan, F. Badia, Rosa M. Barcelona Supercomp Ctr BSC Dept Comp Sci Barcelona Spain UPC DAC Barcelona Spain CSIC Artificial Intelligence Res Inst 3A Barcelona Spain

The advent of the Cloud and the popularization of mobile devices have led us to a shift in computing access where users have an interactive display, and heavy computations run remotely, in the Cloud servers. COMPSs-Mobile is a framework that aims to ease the development of energy-efficient and high-performing applications for this kind of environment. The framework provides an infrastructure-unaware programming model that allows developers to code regular Android applications whose computation is transparently parallelized and partially offloaded to remote resources. This paper gives an overview of the programming model and describes the internal components of the toolkit which supports it focusing on the offloading and checkpointing mechanisms. It also presents the results of some tests conducted to evaluate the behavior of the solution and to measure the potential benefits in Android applications.

关键词： Mobile cloud computing parallel programming model Android Offloading Checkpointing Distributed hash table

来源：评论

学校读者我要写书评

暂无评论

PFACC: An OpenACC-like programming model for irregular nested parallelism

引用

SOFTWARE-PRACTICE & EXPERIENCE 2020年第10期50卷 1877-1904页

作者： Huang, Ming Hsiang Yang, Wuu Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan

OpenACC is a directive-based programming model which allows programmers to write graphic processing unit (GPU) programs by simply annotating parallel loops. However, OpenACC has poor support for irregular nested parallel loops, which are natural choices to express nested parallelism. We propose PFACC, a programming model similar to OpenACC. PFACC directives can be used to annotate parallel loops and to guide data movement between different levels of memory hierarchy. parallel loops can be arbitrarily nested or be placed inside functions that would be (possibly recursively) called in other parallel loops. The PFACC translator translates C programs with PFACC directives into CUDA programs by inserting runtime iteration-sharing and memory allocation routines. The PFACC runtime iteration-sharing routine is a two-level mechanism. Thread blocks dynamically organize loop iterations intobatchesand execute the batches in a depth-first order. Different thread blocks share iterations among one another with an iteration-stealing mechanism. PFACC generates CUDA programs with reasonable memory usage because of the depth-first execution order. The two-level iteration-sharing mechanism is implemented purely in software and fits well with the CUDA thread hierarchy. Experiments show that PFACC outperforms CUDA dynamic parallelism in terms of performance and code size on most benchmarks.

关键词： dynamic scheduling GPGPU irregular parallelism nested parallelism OpenACC parallel programming model PFACC

来源：评论

学校读者我要写书评

暂无评论

parallel programming

引用

programming AND COMPUTER SOFTWARE 2023年第4期49卷 310-324页

作者： Korneev, V. V. Res & Dev Inst Kvant 4-I Likhachevskii Per 15 Moscow 125438 Russia

The genesis of parallel programming models is considered. It is shown that parallelism and hardware support of synchronization inherent in an architecture determine a parallel programming model. Modern VLSI technology... 详细信息

关键词： VLSI architecture parallel programming model dataflow processing and interconnection graph

来源：评论

学校读者我要写书评

暂无评论

Bridging the Gap Between OpenMP and Task-Based Runtime Systems for the Fast Multipole Method

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2017年第10期28卷 2794-2807页

作者： Agullo, Emmanuel Aumage, Olivier Bramas, Berenger Coulaud, Olivier Pitoiset, Samuel INRIA Ctr Rech Bordeaux Sud Ouest HiePACS F-33405 Talence France

With the advent of complex modern architectures, the low-level paradigms long considered sufficient to build High Performance Computing (HPC) numerical codes have met their limits. Achieving efficiency, ensuring portability, while preserving programming tractability on such hardware prompted the HPC community to design new, higher level paradigms while relying on runtime systems to maintain performance. However, the common weakness of these projects is to deeply tie applications to specific expert-only runtime system APIs. The OpenMP specification, which aims at providing common parallel programming means for shared-memory platforms, appears as a good candidate to address this issue thanks to the latest task-based constructs introduced in its revision 4.0. The goal of this paper is to assess the effectiveness and limits of this support for designing a high-performance numerical library, ScalFMM, implementing the fast multipole method (FMM) that we have deeply re-designed with respect to the most advanced features provided by OpenMP 4. We show that OpenMP 4 allows for significant performance improvements over previous OpenMP revisions on recent multicore processors and that extensions to the 4.0 standard allow for strongly improving the performance, bridging the gap with the very high performance that was so far reserved to expert-only runtime system APIs.

关键词： High performance computing fast multipole method runtime system OpenMP compiler parallel programming model priority commutativity multicore architecture

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：