In recent years, processing and analysing large graphs has become a major need in many research areas. Distributed graph processing programming models and frameworks arised as a natural solution to process linked data...
详细信息
In recent years, processing and analysing large graphs has become a major need in many research areas. Distributed graph processing programming models and frameworks arised as a natural solution to process linked data of large volumes, such as data originating from social media. These solutions are distributed by design and help developers to perform operations on the graph, sometimes reaching almost real-time performance even on huge graphs. Some of the available graph processing frameworks exploit generic data processing models, like MapReduce, while others were specifically built for graph processing, introducing techniques such as vertex or edge partitioning and graph-oriented programming models. In this work, we analyse the properties of recent and widely popular frameworks - from the perspective of the adopted programming model - designed to process large-scale graphs with the goal of assisting software developers/designers in choosing the most adequate tool.
We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a ...
详细信息
ISBN:
(纸本)9798400708428
We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported programming models, including (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offload] and OpenACC), (3) Python (e.g., numpy, Numba, cuPy, and pyCUDA), and (4) Julia (e.g., Threads, ***, ***, and ***). We use the GitHub Copilot capabilities powered by the GPT-based OpenAI Codex available in Visual Studio Code as of April 2023 to generate a vast amount of implementations given simple + <programming model> + prompt variants. To quantify and compare the results, we propose a proficiency metric around the initial 10 suggestions given for each prompt. Results suggest that the OpenAI Codex outputs for C++ correlate with the adoption and maturity of programming models. For example, OpenMP and CUDA score really high, whereas HIP is still lacking. We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding code keywords, while Julia prompts perform acceptably well for its mature programming models (e.g., Threads and ***). We expect for these benchmarks to provide a point of reference for each programming model's community. Overall, understanding the convergence of large language models, AI, and HPC is crucial due to its rapidly evolving nature and how it is redefining human-computer interactions.
One of the most important issues in the path to the convergence of HPC and Big Data is caused by the differences in their software stacks. Despite some research efforts, the interoperability between their programming ...
详细信息
One of the most important issues in the path to the convergence of HPC and Big Data is caused by the differences in their software stacks. Despite some research efforts, the interoperability between their programming models and languages is still limited. To deal with this problem we introduce a new computing framework called IgnisHPC, whose main objective is to unify the execution of Big Data and HPC workloads in the same framework. IgnisHPC has native support for multi-language applications using JVM and non-JVM-based languages. Since MPI was used as its backbone technology, IgnisHPC takes advantage of many communication models and network architectures. Moreover, MPI applications can be directly executed in an efficient way in the framework. The main consequence is that users could combine in the same multi-language code HPC tasks (using MPI) with Big Data tasks (using MapReduce operations). The experimental evaluation demonstrates the benefits of our proposal in terms of performance and productivity with respect to other frameworks. IgnisHPC is publicly available for the Big Data and HPC research community. (c) 2022 Elsevier B.V. All rights reserved.
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and ...
详细信息
ISBN:
(数字)9783031104190
ISBN:
(纸本)9783031104190;9783031104183
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct (TM) accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Cloud computing can be considered as a disruptive technology that is making life easier for Cloud users. Determining a focus of research in a specific subject area is sometimes challenging. A systematic map enables a ...
详细信息
Cloud computing can be considered as a disruptive technology that is making life easier for Cloud users. Determining a focus of research in a specific subject area is sometimes challenging. A systematic map enables a synthesis of a scheme for categorizing data in a field of interest. The goal of this research paper is to carry out a systematic mapping study of policy language and programming models on the cloud. The mapping involved contribution category such as method, research category such as evaluation and major topics extracted from the abstracts of primary studies. The result indicated there are more publications on evaluation research in term of security with 8.9%. There were more papers published on validation research, solution proposal and experience research on the topic of paradigms with 7.53%, 6.85% and 4.11% respectively. Also, there were more publications on philosophical research in terms of privacy with 4.11%. In addition, there were more articles published on opinion research in terms of the survey with 4.11%. On the other hand, there were no articles on metric in terms of framework, paradigms and accountability, and reliability to the best of the researchers' knowledge. The outcome of this systematic study will be of benefit to cloud users, researchers, practitioners and providers. (c) 2019 The Authors. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an
Julia is a general-purpose, managed, strongly and dynamically-typed programming language with emphasis on high performance scientific computing. Traditionally, HPC software development uses languages such as C, C++ an...
详细信息
ISBN:
(纸本)9781665411189
Julia is a general-purpose, managed, strongly and dynamically-typed programming language with emphasis on high performance scientific computing. Traditionally, HPC software development uses languages such as C, C++ and Fortran, which compile to unmanaged code. This offers the programmer near bare-metal performance at the expense of safety properties that a managed runtime would otherwise provide. Julia, on the other hand, combines novel programming language design approaches to achieve high levels of productivity without sacrificing performance while using a fully managed runtime. This study provides an evaluation of Julia's suitability for HPC applications from a performance point of view across a diverse range of CPU and GPU platforms. We select representative memory-bandwidth bound and compute bound mini-apps, port them to Julia, and conduct benchmarks across a wide range of current HPC CPUs and GPUs from vendors such as Intel (R), AMD (R), NVIDIA (R), Marvell (R), and Fujitsu (R). We then compare and characterise the results against existing parallel programming frameworks such as OpenMP (R), Kokkos, OpenCL (TM), and first-party frameworks such as CUDA (R), HIP (TM), and oneAPI (TM) SYCL (TM). Finally, we show that Julia's performance either matches the competition or is only a short way behind.
High-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs, such as OpenMP or OmpSs, are directive...
详细信息
High-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs, such as OpenMP or OmpSs, are directive-based: the complexity of the hardware is hidden by the underlying runtime system, improving coding productivity. The implementations of OpenMP usually rely on POSIX threads (pthreads), offering excellent performance for coarse-grained parallelism and a perfect match with the current hardware. OmpSs is a task oriented PM based on an ad hoc runtime solution called Nanos++;it is the precursor of the tasking parallelism in the OpenMP tasking specification. A recent trend in runtimes and applications points to leveraging massive on-node parallelism in conjunction with fine-grained and dynamic scheduling paradigms. In this paper we analyze the behavior of the OpenMP and OmpSs PMs on top of the recently emerged Generic Lightweight Threads (GLT) API GLT exposes a common API for lightweight thread (LWT) libraries that offers the possibility of running the same application over different native LWT solutions. We describe the design details of those high-level PMs implemented on top of GLT and analyze different scenarios in order to assess where the use of LWTs may benefit application performance. Our work reveals those scenarios where LWTs overperform pthread-based solutions and compares the performance between an ad hoc solution and a generic implementation. (C) 2018 Elsevier B.V. All rights reserved.
In this work, we evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port Tealeaf, a miniature proxy appli...
详细信息
In this work, we evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port Tealeaf, a miniature proxy application, or mini app, that solves the heat conduction equation and belongs to the Mantevo Project. We find that the best performance is achieved with architecture-specific implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5% to 30% performance penalty. While the models expose varying levels of complexity to the developer, they all achieve reasonable performance with this application. As such, if this small performance penalty is permissible for a problem domain, we believe that productivity and development complexity can be considered the major differentiators when choosing a modern parallel programming model to develop applications like Tealeaf.
In this work, we evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port Tealeaf, a miniature proxy appli...
详细信息
In this work, we evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port Tealeaf, a miniature proxy application, or mini app, that solves the heat conduction equation and belongs to the Mantevo Project. We find that the best performance is achieved with architecture-specific implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5% to 30% performance penalty. While the models expose varying levels of complexity to the developer, they all achieve reasonable performance with this application. As such, if this small performance penalty is permissible for a problem domain, we believe that productivity and development complexity can be considered the major differentiators when choosing a modern parallel programming model to develop applications like Tealeaf.
This special issue features a collection of papers that extend the literature in unique ways, improving the state of art of programming models and systems software for high-end computing systems.
This special issue features a collection of papers that extend the literature in unique ways, improving the state of art of programming models and systems software for high-end computing systems.
暂无评论