Julia is a general-purpose, managed, strongly and dynamically-typed programming language with emphasis on high performance scientific computing. Traditionally, HPC software development uses languages such as C, C++ an...
详细信息
ISBN:
(纸本)9781665411189
Julia is a general-purpose, managed, strongly and dynamically-typed programming language with emphasis on high performance scientific computing. Traditionally, HPC software development uses languages such as C, C++ and Fortran, which compile to unmanaged code. This offers the programmer near bare-metal performance at the expense of safety properties that a managed runtime would otherwise provide. Julia, on the other hand, combines novel programming language design approaches to achieve high levels of productivity without sacrificing performance while using a fully managed runtime. This study provides an evaluation of Julia's suitability for HPC applications from a performance point of view across a diverse range of CPU and GPU platforms. We select representative memory-bandwidth bound and compute bound mini-apps, port them to Julia, and conduct benchmarks across a wide range of current HPC CPUs and GPUs from vendors such as Intel (R), AMD (R), NVIDIA (R), Marvell (R), and Fujitsu (R). We then compare and characterise the results against existing parallel programming frameworks such as OpenMP (R), Kokkos, OpenCL (TM), and first-party frameworks such as CUDA (R), HIP (TM), and oneAPI (TM) SYCL (TM). Finally, we show that Julia's performance either matches the competition or is only a short way behind.
In this work, we evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port Tealeaf, a miniature proxy appli...
详细信息
In this work, we evaluate several emerging parallel programming models: Kokkos, RAJA, OpenACC, and OpenMP 4.0, against the mature CUDA and OpenCL APIs. Each model has been used to port Tealeaf, a miniature proxy application, or mini app, that solves the heat conduction equation and belongs to the Mantevo Project. We find that the best performance is achieved with architecture-specific implementations but that, in many cases, the performance portable models are able to solve the same problems to within a 5% to 30% performance penalty. While the models expose varying levels of complexity to the developer, they all achieve reasonable performance with this application. As such, if this small performance penalty is permissible for a problem domain, we believe that productivity and development complexity can be considered the major differentiators when choosing a modern parallel programming model to develop applications like Tealeaf.
Transistor size reduction and more aggressive power modes in HPC platforms make chip components more error prone. In this context, HPC applications can have a diverse level of tolerance to memory errors that may chang...
详细信息
ISBN:
(纸本)9783319321493;9783319321486
Transistor size reduction and more aggressive power modes in HPC platforms make chip components more error prone. In this context, HPC applications can have a diverse level of tolerance to memory errors that may change the execution in different ways. As the tolerance to memory errors depends on write frequency and access patterns, different programming models may exhibit a different behavior in the rate of failures and alleviate the performance loss caused by the overhead of fault-tolerance mechanisms. In this paper, we explore how tolerant to memory errors are two main parallel programming models, message-passing and shared memory: we perform a memory vulnerability analysis and also conduct error propagation experiments to observe the effect of memory errors through program flow. Our results show the need for soft error resiliency methods based on memory behavior of programs, and the evaluation of the tradeoffs between performance and reliability.
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and ...
详细信息
ISBN:
(数字)9783031104190
ISBN:
(纸本)9783031104190;9783031104183
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct (TM) accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Parallel systems that employ CPUs and GPUs as two heterogeneous computational units have become immensely popular due to their ability to maximize performance under restrictive thermal budgets. However, programming he...
详细信息
ISBN:
(纸本)9781509000883
Parallel systems that employ CPUs and GPUs as two heterogeneous computational units have become immensely popular due to their ability to maximize performance under restrictive thermal budgets. However, programming heterogeneous systems via traditional programming models like OpenCL or CUDA involves rewriting large portions of application-code. They also lead to code that is not performance portable across different architectures or even across different generations of the same architecture. In this paper, we evaluate the current state of two emerging parallel programming models: C++ AMP and OpenACC. These emerging programming paradigms require minimal code changes and rely on compilers to interact with the low-level hardware language, thereby producing performance portable code from an application standpoint. We analyze the performance and productivity of the emerging programming models and compare them with OpenCL using a diverse set of applications on two different architectures, a CPU coupled with a discrete GPU and an Accelerated programming Unit (APU). Our experiments demonstrate that while the emerging programming models improve programmer productivity, they do not yet expose enough flexibility to extract maximum performance as compared to traditional programming models.
On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programme...
详细信息
On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle the complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet specific enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The first programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address spaces divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.
Undergraduate computer science students typically have only a limited understanding of their favorite languages and no inkling of other programming paradigms. Yet modern programmers typically work with several languag...
详细信息
Undergraduate computer science students typically have only a limited understanding of their favorite languages and no inkling of other programming paradigms. Yet modern programmers typically work with several languages, and the availability of cheap concurrency is exposing fundamental problems in standard concurrent programming techniques ( mutable objects and threads). This situation presents a great opportunity: by exploring nonstandard techniques for gaining intellectual control over concurrent programs, one can motivate and teach important semantic concepts ( such as scoping) and important programming concepts ( such as functional abstraction). Such a curriculum stimulates student interest in exploring new programming paradigms.
Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: ...
详细信息
Computational science has benefited in the last years from emerging accelerators that increase the performance of scientific simulations, but using these devices hinders the programming task. This paper presents AMA: a set of optimization techniques to efficiently manage multi-accelerator systems. AMA maximizes the overlap of computation and communication in a blocking-free way. Then, we can use such spare time to do other work while waiting for device operations. Implemented on top of a task-based framework, the experimental evaluation of AMA on a quad-GPU node shows that we reach the performance of a hand-tuned native CUDA code, with the advantage of fully hiding the device management. In addition, we obtain up to more than 2x performance speed-up with respect to the original framework implementation.
A recent article described a mathematical programming model and heuristic solution procedure to realign sales territories. This report presents two linear integer programming models for sales territory alignment to ma...
详细信息
A recent article described a mathematical programming model and heuristic solution procedure to realign sales territories. This report presents two linear integer programming models for sales territory alignment to maximize profit. Emphasis is placed on the development of models which are easy to implement.
暂无评论