Game designers commonly use paper prototyping to evaluate educational effectiveness, enjoyment, flow, and usability, while also reducing costs and exploring alternative implementations. However, creating a paper proto...
详细信息
ISBN:
(数字)9798350350678
ISBN:
(纸本)9798350350685
Game designers commonly use paper prototyping to evaluate educational effectiveness, enjoyment, flow, and usability, while also reducing costs and exploring alternative implementations. However, creating a paper prototype that yields actionable feedback can be challenging due to the wide range of methods available, from low-fidelity sketches to high-fidelity mockups. Most paper prototypes are static and progress discretely, making it difficult to prototype physics-based games effectively, unless focusing on interfaces, narrative, or underlying systems. This paper details the creation of three prototypes of varying fidelity and metaphor for a physics-based educational game on concurrency and parallel programming. Each prototype undergoes playtesting to assess construction methods and their effectiveness in gathering player feedback.
Unit testing is a standard practice in software engineering and is critical for ensuring software quality. However, for parallel and high-performance computing software, especially scientific computing applications, u...
详细信息
Unit testing is a standard practice in software engineering and is critical for ensuring software quality. However, for parallel and high-performance computing software, especially scientific computing applications, unit testing is not widely implemented. Compared with typical commercial software, high performance software usually have a smaller user base, and they are diverse and usually involve complex logic. These characteristics create several challenges for conducting unit testing for parallel and high performance software. On one hand, it is economically expensive to have a dedicated testing team to do unit testing considering the number of users. On the other hand, it is hard for a quality engineer without domain knowledge to design effective unit testings. Similarly, existing automated unit testing tools are usually not effective for such software. Therefore, it is vital to devise an automated method for generating unit testing cases for parallel and high performance software, which considers the unique features of these software, including complex logic and sophisticated parallel processing techniques. Recently, large language models (LLMs) have attracted more attention and are believed to be a powerful tool for coding and testing, but its application in producing unit tests for parallel and high performance applications remains uncertain. To fill this gap, we explore the capabilities of two well-known generative models, Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo), in crafting unit testing cases for parallel and high performance software. Specifically, we proposed novel ways to utilize LLMs to develop unit testing cases for high performance software with C++ parallel programs and assessed their effectiveness on extensive OpenMP/MPI projects. Our findings indicate that in the context of parallel programming, LLMs can create unit testing cases that are mostly syntactically correct and offer substantial coverage, while they exhibit some limitations
We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implemen...
详细信息
With continuing development of hydropower in China, cascade hydropower system will account for more in the power grid, and may increase power grid operation risk under global climate change. This paper presents a para...
详细信息
With continuing development of hydropower in China, cascade hydropower system will account for more in the power grid, and may increase power grid operation risk under global climate change. This paper presents a parallel chance-constrained dynamic programming model to derive optimal operating policies for a cascade hydropower system in China. The innovation work of this paper is mainly embodied in two aspects. First, the reliabilities of meeting the firm power requirements of the cascade hydropower system and avoiding extreme system failure under extreme events are explicitly embedded in the model using Lagrangian duality theory and a penalty function. Multiple operating policies are generated by updating the values of Lagrangian multiplier and penalty coefficient for system disruption, then best operating rules are selected based on system performance and evaluated according to simulated reliability, extreme system failure, and maximum benefit. Second, the Fork/Join parallel framework is deployed to parallelize the chance-constrained dynamic programming in a multi-core environment for improving computational efficiency. Two computing platforms with contrasting configurations are employed to illustrate the parallelization performance. Results from a cascade hydropower system operation demonstrate that the proposed method is computationally efficient and can obtain satisfying operating policies, especially for extreme drought events. (C) 2018 Elsevier Ltd. All rights reserved.
We propose a CPU-GPU heterogeneous computing method for solving time-evolution partial differential equation problems many times with guaranteed accuracy, in short time-to-solution and low energy-to-solution. On a sin...
详细信息
ISBN:
(数字)9798350355543
ISBN:
(纸本)9798350355550
We propose a CPU-GPU heterogeneous computing method for solving time-evolution partial differential equation problems many times with guaranteed accuracy, in short time-to-solution and low energy-to-solution. On a single-GH200 node, the proposed method improved the computation speed by 86.4 and 8.67 times compared to the conventional method run only on CPU and only on GPU, respectively. Furthermore, the energy-to-solution was reduced by 32.2-fold (from 9944 J to 309 J) and 7.01-fold (from 2163 J to 309 J) when compared to using only the CPU and GPU, respectively. Using the proposed method on the Alps supercomputer, a 51.6-fold and 6.98-fold speedup was attained when compared to using only the CPU and GPU, respectively, and a high weak scaling efficiency of 94.3% was obtained up to 1,920 compute nodes. These implementations were realized using directive-based parallel programming models while enabling portability, indicating that directives are highly effective in analyses in heterogeneous computing environments.
The Common Workflow Language (CWL) is a widely adopted language for defining and sharing computational workflows. It is designed to be independent of the execution engine on which workflows are executed. In this paper...
详细信息
ISBN:
(数字)9798350355543
ISBN:
(纸本)9798350355550
The Common Workflow Language (CWL) is a widely adopted language for defining and sharing computational workflows. It is designed to be independent of the execution engine on which workflows are executed. In this paper, we describe our experiences integrating CWL with Parsl, a Python-based parallel programming library designed to manage execution of workflows across diverse computing environments. We propose a new method that converts CWL CommandLineTool definitions into Parsl apps, enabling Parsl scripts to easily import and use tools represented in CWL. We describe a Parsl runner that is capable of executing a CWL CommandLineTool directly. We also describe a proof-of-concept extension to support inline Python in a CWL workflow definition, enabling seamless use in Parsl’s Python ecosystem. We demonstrate the benefits of this integration by presenting example CWL CommandLineTool definitions that show how they can be used in Parsl, and comparing performance of executing an image processing workflow using the Parsl integration and other CWL runners.
OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified some impediments to pe...
详细信息
ISBN:
(数字)9798331509095
ISBN:
(纸本)9798331509101
OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified some impediments to performance with this approach arising from limitations in the API or in the available implementations. Advanced programming concepts such as hierarchical parallelism and use of dynamic shared memory were a particular area of concern. In this paper, we apply recent improvements and extensions in the LLVM/Clang OpenMP compiler and runtime library to the Kokkos backend that targets GPUs via OpenMP offload. We focus on efficient hierarchical parallelism and use of fast GPU scratch memory. We compare the performance of applications written using the Kokkos library with this improved OpenMP backend against the same programs using the CUDA and HIP backends. This evaluation shows progress toward closing the performance gaps between native and OpenMP backends and offers insights that may be useful to users and implementers of other runtime systems and programming frameworks for GPUs.
There is an unmet need for static data race checkers that can analyze incomplete programs typical of early program development stages, and are also easily to adapt to different parallel programming models. In this wor...
详细信息
ISBN:
(数字)9783982633619
There is an unmet need for static data race checkers that can analyze incomplete programs typical of early program development stages, and are also easily to adapt to different parallel programming models. In this work, we present a novel race checking approach based on Graph Neural Networks (GNN) called GORC that has these attributes. GORC is trained on PrograML control/data graph representations extracted from OpenMP programs that are labeled as racy or race-free, and helps predict races in unseen OpenMP programs. We provide a detailed evaluation of GORC, demonstrating that our approach can deliver high accuracy while also handling many more programs than existing static race checkers. Despite the scarcity of training data, GORC achieves a higher recall rate than LLOV, a widely cited static race checker for OpenMP. It outperforms state-of-the-art ML-based techniques for OpenMP data race detection on three different data-sets. This paper describes GORC's architecture, detailed evaluations, and a novel attribution study that confirms that GORC is learning features relevant to producing data race classifications.
Hybrid MPI + X models, combining the Message Passing Interface (MPI) with node-level parallel programming models, increase complexity and introduce additional correctness issues. This work addresses the challenges of ...
详细信息
ISBN:
(数字)9798350355543
ISBN:
(纸本)9798350355550
Hybrid MPI + X models, combining the Message Passing Interface (MPI) with node-level parallel programming models, increase complexity and introduce additional correctness issues. This work addresses the challenges of detecting data races in hybrid CUDA-aware MPI applications due to the asynchronous and non-blocking nature of CUDA and MPI APIs. We introduce CuSan, an LLVM compiler extension, and runtime that tracks CUDA-specific concurrency, synchronization, and memory access semantics. We integrate CuSan with MUST, a dynamic MPI correctness tool, and ThreadSanitizer (TSan), a thread-level data race detector. MUST with TSan can already detect concurrency issues for multi-threaded MPI codes. Together with CuSan, these tools allow for comprehensive correctness checking of concurrency issues in CUDA-aware MPI applications. Our evaluation of two mini-apps reveals runtime overhead of CuSan ranging from 6× to 36×, depending on the amount of memory tracked by TSan, compared to the uninstrumented version. Memory overhead consistently remains under 1.8×. CuSan is available at https://***/tudasc/cusan.
Despite performance limitations due to its interpreted nature, Python remains a dominant language among scientists and engineers. Enhancing its capabilities for parallel programming unlocks significant potential withi...
详细信息
ISBN:
(数字)9798350385625
ISBN:
(纸本)9798350385632
Despite performance limitations due to its interpreted nature, Python remains a dominant language among scientists and engineers. Enhancing its capabilities for parallel programming unlocks significant potential within parallel and cloud computing environments. mpiPython, a Python binding for message-passing interfaces, empowers Python for Single Program Multiple Data (SPMD) execution, enabling efficient parallel computations. Additionally, Python's inherent accessibility and versatility foster a growing demand for scaling and parallelizing it on distributed cloud environments. This paper extends mpiPython, bridging the gap in collective operations for parallel computing. The extension builds upon the original mpiPython's class-based structure, emphasizing two core principles: supporting vanilla Python with MPI and focusing on a C-based CPU-focused implementation. Unlike existing implementations like mpi4py, mpiPython directly interacts with the Python C API, offering greater control. Two new functions, MPI Gather and MPI Reduce, significantly improve efficiency and streamline collective operations between working nodes. The results demonstrate mpiPython's ability to perform at the level of other libraries while prioritizing a simple implementation accessible to a broad range of users.
暂无评论