Nowadays, any personal computer includes a GPU that allows to use its parallelism to speedup computations. Unfortunately, it is not a trivial task to take advantage of such parallel architectures. In this paper, we pr...
详细信息
ISBN:
(纸本)9781728145693
Nowadays, any personal computer includes a GPU that allows to use its parallelism to speedup computations. Unfortunately, it is not a trivial task to take advantage of such parallel architectures. In this paper, we present a library of swarm intelligence metaheuristics providing automatic parallelizations. The library is freely available, it is implemented by using CUDA, and it includes parallel versions of metaheuristics for both continuous and discrete domains. We show the basic ideas of the parallelization of one of the metaheuristics, and we prove its usefulness by presenting empirical results.
Peachy parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of ...
详细信息
ISBN:
(纸本)9781538655559
Peachy parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of assignments includes face recognition, finding the electrical potential of a square wire, and heat diffusion. All of these come with sample assignment sheets and the necessary starter code.
parallel programmers mandate high-level parallel programming tools allowing to reduce the effort of the efficient parallelization of their applications. parallel programming leveraging parallel patterns has recently r...
详细信息
ISBN:
(纸本)9781728116440
parallel programmers mandate high-level parallel programming tools allowing to reduce the effort of the efficient parallelization of their applications. parallel programming leveraging parallel patterns has recently received renovated attention thanks to their clear functional and parallel semantics. In this work, we propose a synergy between the well-known Actors-based programming model and the pattern-based parallelization methodology. We present our preliminary results in that direction, discussing and assessing the implementation of the Map parallel pattern by using an Actor-based software accelerator abstraction that seamlessly integrates within the C++ Actor Framework (ICAF). The results obtained on the Intel Xeon Phi KNL platform demonstrate good performance figures achieved with negligible programming efforts.
Data races are notorious bugs. They introduce non-determinism in programs behavior, complicate programs semantics, making it challenging to debug parallel programs. To make parallel programming easier, efficient data ...
详细信息
Data races are notorious bugs. They introduce non-determinism in programs behavior, complicate programs semantics, making it challenging to debug parallel programs. To make parallel programming easier, efficient data race detection has been a research topic in the last decades. However, existing data race detectors either sacrifice precision or incur high overhead, limiting their application to real-world applications and scenarios. This dissertation proposes approaches to improve the performance of dynamic data race detection without undermining precision, by identifying and removing metadata redundancy dynamically. This dissertation also explores ways to make it practical to detect data races dynamically for GPU programs, which has a disparate programming and execution model from CPU workloads. Further, this dissertation shows how the structured synchronization model in GPU programs can simplify the algorithm design of data race detection for GPU, and how the unique patterns in GPU workloads enable an efficient implementation of the algorithm, yielding a high-performance dynamic data race detector for GPU programs.
This paper outlines a research and development program to enhance modern compiler technology, and the LLVM compiler infrastructure specifically, to directly optimize parallel-programming-model constructs. The goal is ...
详细信息
ISBN:
(数字)9783030346270
ISBN:
(纸本)9783030346270;9783030346263
This paper outlines a research and development program to enhance modern compiler technology, and the LLVM compiler infrastructure specifically, to directly optimize parallel-programming-model constructs. The goal is to produce higher-quality code, and moreover, to remove abstraction penalties generally associated with such constructs. We believe that such abstraction penalties are increasing in importance due to C++ parallel-algorithms libraries and other performance-portability-motivated programming methods. In addition, we will discuss when, and more importantly when not, explicit parallelism-awareness is necessary within the compiler in order to enable the desired optimization capabilities.
HPC applications and libraries have frequently moved parallel data from one distribution scheme to another, for reasons of performance. In modern times, a resurgence of interest in this data redistribution problem has...
详细信息
ISBN:
(纸本)9783030105495;9783030105488
HPC applications and libraries have frequently moved parallel data from one distribution scheme to another, for reasons of performance. In modern times, a resurgence of interest in this data redistribution problem has emerged due to the need to relocate data distributed across one Producer grid onto a different distribution scheme across a Consumer grid. In this paper, we study the efficient algorithms to perform redistribution, and show how the best methods from the literature are still dependent on the number of processors in both grids. We describe a new algorithm ASPEN that exploits more cyclic patterns and relations in the distribution, is not dependent on the total number of processors and is thus well suited for use in a workflow management systems. We describe a preliminary implementation of the algorithm within such a workflow system and show performance results that indicate a significant performance benefit in data redistribution generation.
Theoretical and experimental analysis of MPI_Bcast algortihms is presented. The optimal tree degrees and segment sizes for pipelined versions of algorithms are obtained. Algorithms were investigated according to their...
详细信息
ISBN:
(纸本)9781728129860
Theoretical and experimental analysis of MPI_Bcast algortihms is presented. The optimal tree degrees and segment sizes for pipelined versions of algorithms are obtained. Algorithms were investigated according to their implementation in the Open MPI library. Theoretical results are consistent with experiments on a computer cluster with Gigabit Ethernet and InfiniBand communication networks.
NASA Technical Reports Server (Ntrs) 20050210018: Enabling Requirements-Based programming for Highly-Dependable Complex parallel and Distributed Systems by NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 20050210018: Enabling Requirements-Based programming for Highly-Dependable Complex parallel and Distributed Systems by NASA Technical Reports Server (Ntrs); published by
This thesis develops a compiler to convert a program written in the verification friendly programming language Whiley into an efficient implementation in C. Our compiler uses a mixture of static analysis, run-time mon...
详细信息
This thesis develops a compiler to convert a program written in the verification friendly programming language Whiley into an efficient implementation in C. Our compiler uses a mixture of static analysis, run-time monitoring and a code generator to and faster integer types, eliminate unnecessary array copies and de-allocate unused memory without garbage collection, so that Whiley programs can be translated into C code to run fast and for long periods on general operating systems as well as limited-resource embedded devices. We also present manual and automatic proofs to verify memory safety of our implementations, and benchmark on a variety of test cases for practical use. Our benchmark results show that, in our test suite, our compiler effectively reduces the time complexity to the lowest possible level and stops all memory leaks without causing double-freeing problems. The performance of implementations can be further improved by choosing proper integer types within the ranges and exploiting parallelism in the programs.
With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on...
详细信息
ISBN:
(数字)9781665422871
ISBN:
(纸本)9781665422888
With Exascale machines on our immediate horizon, there is a pressing need for applications to be made ready to best exploit these systems. However, there will be multiple paths to Exascale, with each system relying on processor and accelerator technologies from different vendors. As such, applications will be required to be portable between these different architectures, but it is also critical that they are efficient too. These double requirements for portability and efficiency begets the need for performance portability. In this study we survey the performance portability of different programming models, including the open standards OpenMP and SYCL, across the diverse landscape of Exascale and pre-Exascale processors from Intel, AMD, NVIDIA, Fujitsu, Marvell, and Amazon, together encompassing GPUs and CPUs based on both x86 and Arm architectures. We also take a historical view and analyse how performance portability has changed over the last year.
暂无评论