the proceedings contain 6 papers. the topics discussed include: analysis of validating and verifying OpenACC compilers 3.0 and above;OmpSs-2 and OpenACC interoperation;extending MAGMA portability with OneAPI;KokkACC: ...
ISBN:
(纸本)9781665490191
the proceedings contain 6 papers. the topics discussed include: analysis of validating and verifying OpenACC compilers 3.0 and above;OmpSs-2 and OpenACC interoperation;extending MAGMA portability with OneAPI;KokkACC: enhancing Kokkos with OpenACC;SPEL: software tool for porting E3SM land model with OpenACC in a function unit test framework;and GPU-accelerated sparse matrix vector product based on element-by-element method for unstructured FEM using OpenACC.
the proceedings contain 7 papers presendted at a virtual meeting. the special focus in this conference is on acceleratorprogrammingusingdirectives. the topics include: GPU Offloading of a Large-Scale Gyrokinetic Pa...
ISBN:
(纸本)9783030977580
the proceedings contain 7 papers presendted at a virtual meeting. the special focus in this conference is on acceleratorprogrammingusingdirectives. the topics include: GPU Offloading of a Large-Scale Gyrokinetic Particle-in-Cell Fortran Code on Summit: From OpenACC to OpenMP;accelerating Quantum Many-Body Configuration Interaction withdirectives;challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-Based Offloading;GPU Porting of Scalable Implicit Solver with Green’s Function-Based Neural Networks by OpenACC;Extending OpenMP for Machine Learning-Driven Adaptation.
the proceedings contain 9 papers. the special focus in this conference is on acceleratorprogrammingusingdirectives. the topics include: Implicit low-order unstructured finite-element multiple simulation enhanced by...
ISBN:
(纸本)9783319748955
the proceedings contain 9 papers. the special focus in this conference is on acceleratorprogrammingusingdirectives. the topics include: Implicit low-order unstructured finite-element multiple simulation enhanced by dense computation using OpenACC;the design and implementation of OpenMP 4.5 and OpenACC backends for the RAJA C++ performance portability layer;Enabling GPU support for the COMPSs-mobile framework;Concurrent parallel processing on graphics and multicore processors with OpenACC and OpenMP;Exploration of supervised machine learning techniques for runtime selection of CPU vs. GPU execution in java programs;Automatic testing of OpenACC applications;evaluation of asynchronous offloading capabilities of acceleratorprogramming models for multiple devices.
OpenACC is a high-level directive-based parallel programming model that can manage the sophistication of heterogeneity in architectures and abstract it from the users. the portability of the model across CPUs and acce...
详细信息
ISBN:
(纸本)9781665490191
OpenACC is a high-level directive-based parallel programming model that can manage the sophistication of heterogeneity in architectures and abstract it from the users. the portability of the model across CPUs and accelerators has gained the model a wide variety of users. this means it is also crucial to analyze the reliability of the compilers' implementations. To address this challenge, the OpenACC Validation and Verification team has proposed a validation testsuite to verify the OpenACC implementations across various compilers with an infrastructure for a more streamlined execution. this paper will cover the following aspects: (a) the new developments since the last publication on the testsuite, (b) outline the use of the infrastructure, (c) discuss tests that highlight our workflow process, (d) analyze the results from executing the testsuite on various systems, and (e) outline future developments.
We propose an interoperation mechanism to enable novel composability across pragma-based programming models. We study and propose a clear separation of duties and implement our approach by augmenting the OmpSs-2 progr...
详细信息
ISBN:
(纸本)9781665490191
We propose an interoperation mechanism to enable novel composability across pragma-based programming models. We study and propose a clear separation of duties and implement our approach by augmenting the OmpSs-2 programming model, compiler and runtime system to support OmpSs-2 + OpenACC programming. To validate our proposal we port ZPIC, a kinetic plasma simulator, to leverage our hybrid OmpSs-2 + OpenACC implementation. We compare our approach against OpenACC versions of ZPIC on a multi-GPU HPC system. We show that our approach manages to provide automatic asynchronous and multi-GPU execution, removing significant burden from the application's developer, while also being able to outperform manually programmed versions, thanks to a better utilization of the hardware.
Template metaprogramming is gaining popularity as a high-level solution for achieving performance portability on heterogeneous computing resources. Kokkos is a representative approach that offers programmers high-leve...
详细信息
ISBN:
(纸本)9781665490191
Template metaprogramming is gaining popularity as a high-level solution for achieving performance portability on heterogeneous computing resources. Kokkos is a representative approach that offers programmers high-level abstractions for generic programming while most of the device-specific code generation and optimizations are delegated to the compiler through template specializations. For this, Kokkos provides a set of device-specific code specializations in multiple back ends, such as CUDA and HIP. Unlike CUDA or HIP, OpenACC is a high-level and directive-based programming model. this descriptive model allows developers to insert hints (pragmas) into their code that help the compiler to parallelize the code. the compiler is responsible for the transformation of the code, which is completely transparent to the programmer. this paper presents an OpenACC back end for Kokkos: KokkACC. As an alternative to Kokkos's existing device-specific back ends, KokkACC is a multi-architecture back end providing a high-productivity programming environment enabled by OpenACC's high-level and descriptive programming model. Moreover, we have observed competitive performance;in some cases, KokkACC is faster (up to 9x) than NVIDIA's CUDA back end and much faster than OpenMP's GPU offloading back end. this work also includes implementation details and a detailed performance study conducted with a set of mini-benchmarks (AXPY and DOT product) and three mini-apps (LULESH, miniFE and SNAP, a LAMMPS proxy mini-app).
accelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concur...
详细信息
ISBN:
(纸本)9783319748962;9783319748955
accelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concurrently which can be achieved via asynchronous kernel launches. this work compares the performance for an implementation of the Conjugate Gradient method with CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs. Furthermore, it takes a look at Intel Xeon Phi coprocessors when programmed with OpenCL and OpenMP. In doing so, it tries to answer the question of whether the higher abstraction level of directive based models is inferior to lower level paradigms in terms of performance.
Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy c...
详细信息
ISBN:
(纸本)9783030742232;9783030742249
Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today's systems to tomorrow's. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. this work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other;a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC's Cori system when usingthe Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when usingthe Cray-llvm compiler on Cori.
this book constitutes the proceedings of the 7th International workshop on acceleratorprogrammingusingdirectives, waccpd 2020, which took place on November 20, 2021. the workshop was initially planned to take ...
详细信息
ISBN:
(数字)9783030742249
ISBN:
(纸本)9783030742232
this book constitutes the proceedings of the 7th International workshop on acceleratorprogrammingusingdirectives, waccpd 2020, which took place on November 20, 2021. the workshop was initially planned to take place in Atlanta, GA, USA, and changed to an online format due to the COVID-19 pandemic.
暂无评论