We present CaCUDA - a gpgpu kernel abstraction and a parallel programming framework for developing highly efficient large scale scientific applications using stencil computations on hybrid CPU/GPU architectures. CaCUD...
详细信息
ISBN:
(纸本)9781450311601
We present CaCUDA - a gpgpu kernel abstraction and a parallel programming framework for developing highly efficient large scale scientific applications using stencil computations on hybrid CPU/GPU architectures. CaCUDA is built upon the Cactus computational toolkit, an open source problem solving environment designed for scientists and engineers. Due to the flexibility and extensibility of the Cactus toolkit, the addition of a gpgpu programming framework required no changes to the Cactus infrastructure, guaranteeing that existing features and modules will continue to work without modification. CaCUDA was tested and benchmarked using a 3D CFD code based on a finite difference discretization of Navier-Stokes equations.
This paper describes a method to estimate the body pose of a human from the point cloud obtained from a depth sensor. It uses Differential Evolution to find the best match between a candidate pose, represented by an i...
详细信息
ISBN:
(纸本)9781450319638
This paper describes a method to estimate the body pose of a human from the point cloud obtained from a depth sensor. It uses Differential Evolution to find the best match between a candidate pose, represented by an instance of a 42-parameter articulated model of a human, and the point cloud. The results, compared to other four state-of-the art methods on a publicly available dataset, show that the method has good ability to estimate the pose of a person and to track him in video sequences. The entire method, from Differential Evolution to fitness computation, is run on nVIDIA GPUs. Thanks to its massively parallel implementation in CUDA-C, it produces pose estimates in real time.
Many of the largest supercomputers are based on heterogeneous architectures with multiple general-purpose graphics processing units (gpgpus) per compute node. While many APIs for GPU programming are vendor-specific, O...
详细信息
ISBN:
(纸本)9783031159220;9783031159213
Many of the largest supercomputers are based on heterogeneous architectures with multiple general-purpose graphics processing units (gpgpus) per compute node. While many APIs for GPU programming are vendor-specific, OpenMP offers a portable alternative. Therefore OpenMP target offloading is advantageous in terms of long-term code sustainability. Further, many applications have already been parallelized with OpenMP. Hence the amount of work needed to port the code to GPUs may be limited. However, the support for the OpenMP 5.x specification is not equally mature across different compilers. Additionally, the multi-GPU support in the OpenMP 5.x specification is limited. We explore what is possible with the Nvidia NVC compiler. We present a case study of solving the Poisson equation on multiple gpgpus to outline which approaches for multi-target offloading give good results. We find that a task-based multi-GPU implementation leads to better performance than generating deferrable tasks with the now ait clause. We demonstrate that data transfers and computations can be fully overlapped by using only the subset of the OpenMP specifications, which is supported in the 22.3 release of the Nvidia NVC compiler. For compute nodes with multiple Nvidia A100 or V100, we obtain close to ideal strong scaling when increasing the number of accelerators.
暂无评论