检索结果-内蒙古大学图书馆

IEEE International Workshop/Symposium on Workload Characterization

作者： Xiaochun Zhang Timothy M. Jones Simone Campanoni Huawei University of Cambridge Northwestern University

ISBN: (纸本)9781665441742

Automatic parallelizing compilers are often constrained in their transformations because they must conservatively respect data dependences within the program. Developers, on the other hand, often take advantage of domain-specific knowledge to apply transformations that modify data dependences but respect the application's semantics. This creates a semantic gap between the parallelism extracted automatically by compilers and manually by developers. Although prior work has proposed programming language extensions to close this semantic gap, their relative contribution is unclear and it is uncertain whether compilers can actually achieve the same performance as manually parallelized code when using them. We quantify this semantic gap in a set of sequential and parallel programs and leverage these existing programming-language extensions to empirically measure the impact of closing it for an automatic parallelizing compiler. This lets us achieve an average speedup of 12.6× on an Intel-based 28-core machine, matching the speedup obtained by the manually parallelized code. Further, we apply these extensions to widely used sequential system tools, obtaining 7.1× speedup on the same system.

关键词： Computer languages Program processors Codes parallel programming Semantics Manuals parallel processing

来源：评论

学校读者我要写书评

暂无评论

Introducing a Stream Processing Framework for Assessing parallel programming Interfaces

Introducing a Stream Processing Framework for Assessing Para...

引用

Euromicro Conference on parallel, Distributed and Network-Based Processing

作者： Adriano Marques Garcia Dalvan Griebler Luiz G. L. Fernandes Claudio Schepke School of Technology Pontifical Catholic University of Rio Grande do Sul (PUCRS) Porto Alegre Brazil Laboratory of Advanced Research on Cloud Computing (LARCC) Três de Maio Maio Faculty (SETREM) Três de Maio Brazil Federal University of Pampa (UNIPAMPA) Alegrete Brazil

Stream Processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently computation. It can be done through Stream parallelism, which is still a challenging task and most reserved for experts. We introduce a Stream Processing framework for assessing parallel programming Interfaces (PPIs). Our framework targets multi-core architectures and C++ stream processing applications, providing an API that abstracts the details of the stream operators of these applications. Therefore, users can easily identify all the basic operators and implement parallelism through different PPIs. In this paper, we present the proposed framework, implement three applications using its API, and show how it works, by using it to parallelize and evaluate the applications with the PPIs Intel TBB, FastFlow, and SPar. The performance results were consistent with the literature.

关键词： Industries parallel programming Distributed databases Computer architecture Streaming media parallel processing Task analysis

来源：评论

学校读者我要写书评

暂无评论

HiPC 2021 Workshop on parallel programming in the Exascale Era (PPEE 2021)

HiPC 2021 Workshop on Parallel Programming in the Exascale E...

引用

IEEE International Conference on High Performance Computing Workshops (HiPCW)

作者： Vivek Kumar Swarnendu Biswas Vishwesh Jatala

ISBN: (纸本)9781665410380

This is the first edition of the PPEE workshop. The upcoming exascale systems will impose new requirements on application developers and programming systems to target platforms with hundreds of homogeneous and heterogeneous cores. The four critical challenges for exascale systems are extreme parallelism, power demand, data movement, and reliability. These systems are aimed to solve problems that were previously out of reach and to improve the parallel performance of applications by a factor of 50x. The power budget for achieving a billion billion (quintillion) floating-point operations per second (exaflops) should be within 20-30 MW. Moving the data on these systems relative to the computation will be challenging due to complex memory hierarchies. It would be essential to keep the CPUs/accelerators busy once they have the data to avoid memory bottlenecks. Failures on these systems are anticipated to occur many times a day, such that the existing approach for resiliency, such as checkpointing and restart, will not work.

关键词： parallel programming power budget memory hierarchy application developer Workshops workshops checkpointing Restarting Heterogeneous Memory Memory Platform floating point arithmetic

来源：评论

学校读者我要写书评

暂无评论

An urban transportation problem solved by parallel programming with hyper-heuristics

引用

ENGINEERING OPTIMIZATION 2019年第11期51卷 1965-1979页

作者： Rodriguez, Diego A. Oteiza, Paola P. Brignole, Nelida B. UNS CONICET Planta Piloto Ingn Quim PLAPIQUI Bahia Blanca Buenos Aires Argentina UNS DIQ Bahia Blanca Buenos Aires Argentina UNS Lab Invest & Desarrollo Comp Cient LIDECC DCIC Bahia Blanca Buenos Aires Argentina Univ Nacl Salta UNSa Dept Informat Fac Ciencias Exactas Salta Argentina

An innovative optimization strategy by means of hyper-heuristics is proposed. It consists of a parallel combination of three metaheuristics. In view of the need both to escape from local optima and to achieve high diversity, the algorithm cooperatively combines simulated annealing with genetic algorithms and ant colony optimization. A location routing problem (LRP), which aims at the design of transport networks, was adopted for the performance evaluation of the proposed algorithm. Information exchanges took place effectively between the metaheuristics and speeded up the search process. Moreover, the parallel implementation was useful since it allowed several metaheuristics to run simultaneously, thus achieving a significant reduction in the computational time. The algorithmic efficiency and effectiveness were ratified for a medium-sized city. The proposed optimization algorithm not only accelerated computations, but also helped to improve solution quality.

关键词： Optimization LRP parallel programming hyper-heuristics transport

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of Sequential and parallel programming Paradigms on CPU-GPUs Cluster

Performance Analysis of Sequential and Parallel Programming ...

引用

Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), International Conference on

作者： B N Chandrashekhar H A Sanjay Nitte Meenakshi Institute of Technology Bangalore India

ISBN: (数字)9781665419604

ISBN: (纸本)9781665429986

The entire world of parallel computing endured a change when accelerators are gradually embraced in today's high-performance computing cluster. A hybrid CPU-GPU cluster is required to speed up the complex computations by using parallel programming paradigms. This paper deals with performance evaluation of sequential, parallel and hybrid programming paradigms on the hybrid CPU-GPU cluster using the sorting strategies such as quick sort, heap sort and merge sort. In this research work performance comparison of C, MPI, and hybrid [MPI+CUDA] on CPU-GPUs hybrid systems are performed by using the sorting strategies. From the analysis it is observed that, the performance of parallel programming paradigm MPI is better when compared against sequential programming model. Also, research work evaluates the performance of CUDA on GPUs and hybrid programming model [MPI+CUDA] on CPU+GPU cluster using merge sort strategies and noticed that hybrid programming model [MPI+CUDA] has better performance against traditional approach and parallel programming paradigms MPI and CUDA When the overall performance of all three programming paradigms are compared, MPI+CUDA based on CPU+GPU environment gives the best speedup.

关键词： Performance evaluation parallel programming Computational modeling Graphics processing units programming parallel processing Sorting

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

arXiv

引用

arXiv 2021年

作者： Cordeiro, Alexssandro Ferreira de Paula Filho, Pedro Luiz da Silva, Hamilton Pereira Candido, Arnaldo Casanova, Edresson Spancerski, Jandrei Sartori Department of Computing The Federal Technological University of Parana Brazil Avenue 4232 Medianeira - PR85884-000 Brazil

Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used, along with the CUDAFY library and the programming language C# with the IDE Visual studio. The results of the comparisons indicate that the form of sequential programming in a CPU generates reliable images at a high custom of time when compared to the forms of parallel programming in CPU and GPU. While parallel programming generates faster results, but with increased noise in the reconstructed image. For data types float a GPU obtained best result with average time equivalent to 31 of the processor, however the data is of type double the parallel CPU approach obtained the best performance. For the float data type, the GPU had the best average time performance, while for the double data type the best average time performance was for the parallel approach CPU. Regarding image quality, the sequential approach obtained similar outputs, while theparallel approaches generated noise in their outputs. Copyright © 2021, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of parallel programming Paradigms on CPU-GPU Clusters

Performance Analysis of Parallel Programming Paradigms on CP...

引用

Artificial Intelligence and Smart Systems (ICAIS), International Conference on

作者： B N Chandrashekhar H A Sanjay Tulasi Srinivas Nitte meenakshi Institute of Technology Benagluru India

ISBN: (数字)9781728195377

ISBN: (纸本)9781728195384

CPU-GPU based cluster computing in today's modern world encompasses the domain of complex and high-intensity computation. To exploit the efficient resource utilization of a cluster, traditional programming paradigm is not sufficient. Therefore, in this article, the performance parallel programming paradigms like OpenMP on CPU cluster and CUDA on GPU cluster using BFS and DFS graph algorithms is analyzed. This article analyzes the time efficiency to traverse the graphs with the given number of nodes in two different processors. Here, CPU with OpenMP platform and GPU with CUDA platform support multi-thread processing to yield results for various nodes. From the experimental results, it is observed that parallelization with the OpenMP programming model using the graph algorithm does not boost the performance of the CPU processors, instead, it decreases the performance by adding overheads like idling time, inter-thread communication, and excess computation. On the other hand, the CUDA parallel programming paradigm on GPU yields better results. The implementation achieves a speed-up of 187 to 240 times over the CPU implementation. This comparative study assists the programmers provocatively and select the optimum choice among OpenMP and CUDA parallel programming paradigms.

关键词： parallel programming Computational modeling Graphics processing units Clustering algorithms Cluster computing Performance analysis Resource management

来源：评论

学校读者我要写书评

暂无评论

Separation of gates in quantum parallel programming

arXiv

引用

arXiv 2021年

作者： He, Kan Liu, Shusen Hou, Jinchuan College of Information and Computer & College of Mathematics Taiyuan University of Technology Shanxi Taiyuan030024 China Institute for Quantum Computing Baidu Research Beijing100193 China Institute for Advanced Study Tsinghua University Beijing100084 China College of Mathematics Taiyuan University of Technology Shanxi Taiyuan030024 China

The number of qubits in current quantum computers is a major restriction on their wider application. To address this issue, Ying conceived of using two or more small-capacity quantum computers to produce a larger-capacity quantum computing system by quantum parallel programming ([M. S. Ying, Morgan-Kaufmann, 2016]). In doing so, the main obstacle is separating the quantum gates in the whole circuit to produce a tensor product of the local gates. In this study, we theoretically analyse the (sufficient and necessary) separability conditions of multipartite quantum gates in finite or infinite dimensional systems. We then conduct separation experiments with n-qubit quantum gates on IBM quantum computers using QSI *** Codes ***, 02.30. Tb Copyright © 2021, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Teaching High Productivity and High Performance in an Introductory parallel programming Course

Teaching High Productivity and High Performance in an Introd...

引用

IEEE International Conference on High Performance Computing Workshops (HiPCW)

作者： Vivek Kumar IIIT Delhi India

ISBN: (纸本)9781665410380

Multicore processors are ubiquitous. Several prior research has emphasized the need for high productivity parallel programming models that require minimal changes to the sequential program and can still deliver high performance using runtimes based approaches on various architectures. In this paper, we present the structure and experience of teaching the Foundations of parallel programming course (FPP) at IIIT Delhi using a task-based parallel programming model, Habanero C/C++ Library (HClib). FPP covers a wide breadth of topics in parallel programming but emphasizes both high productivity and high performance. It is being offered at IIIT Delhi in the spring semester for undergraduate and postgraduate students since 2017. We describe our novel approach where the students start the learning process using the traditional parallel programming models, discover the underlying limitations, and build runtime solutions to achieve high performance.

关键词： Productivity Runtime parallel programming Conferences Computational modeling Education Task analysis

来源：评论

学校读者我要写书评

暂无评论

Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2019年第3期33卷 534-553页

作者： Szustak, Lukasz Bratek, Pawel Czestochowa Tech Univ Fac Mech Engn & Comp Sci Inst Comp & Informat Sci Dabrowskiego 69 PL-42201 Czestochowa Poland

In this work, we take up the challenge of performance portable programming of heterogeneous stencil computations across a wide range of modern shared-memory systems. An important example of such computations is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), the second major part of the dynamic core of the EULAG geophysical model. For this aim, we develop a set of parametric optimization techniques and four-step procedure for customization of the MPDATA code. Among these techniques are: islands-of-cores strategy, (3+1)D decomposition, exploiting data parallelism and simultaneous multithreading, data flow synchronization, and vectorization. The proposed adaptation methodology helps us to develop the automatic transformation of the MPDATA code to achieve high sustained scalable performance for all tested ccNUMA platforms with Intel processors of last generations. This means that for a given platform, the sustained performance of the new code is kept at a similar level, independently of the problem size. The highest performance utilization rate of about 41-46% of the theoretical peak, measured for all benchmarks, is provided for any of the two-socket servers based on Skylake-SP (SKL-SP), Broadwell, and Haswell CPU architectures. At the same time, the four-socket server with SKL-SP processors achieves the highest sustained performance of around 1.0-1.1 Tflop/s that corresponds to about 33% of the peak.

关键词： parallel programming performance portability shared-memory systems heterogeneous stencils EULAG model MPDATA code parameterization Skylake Knights Landing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：