This paper presents a method based on the Laplace domain for the simulation of electromagnetic transients in power systems using a graphical processing unit (GPU). The proposed technique employs a parallel approach wh...
详细信息
ISBN:
(纸本)9798331521042;9798331521035
This paper presents a method based on the Laplace domain for the simulation of electromagnetic transients in power systems using a graphical processing unit (GPU). The proposed technique employs a parallel approach where all subsystems associated with different complex frequencies are solved simultaneously using the massive threads provided by the GPU architecture. Additionally, Numerical Laplace Transform (NLT) algorithms are parallelized using a CUDA library. To reduce computational costs, Kron's reduction method is used to calculate transient responses only at the nodes of interest in the network rather than across the entire network. Two case studies on an 18-node network are presented to demonstrate the performance and accuracy of the proposed method. Simulation results are compared with those of the conventional NLT implementation in MATLAB, as well as with those of the commercial software PSCAD. It is concluded that the proposed technique has potential applications for simulations in both modes, real-time and faster-than-real-time.
This paper compares OpenCL, CUDA, and HIP as compilation targets for Futhark, a functional array language. We compare the performance of OpenCL versus CUDA, and OpenCL versus HIP, on the code generated by the Futhark ...
详细信息
ISBN:
(纸本)9798400711008
This paper compares OpenCL, CUDA, and HIP as compilation targets for Futhark, a functional array language. We compare the performance of OpenCL versus CUDA, and OpenCL versus HIP, on the code generated by the Futhark compiler on a collection of 48 application benchmarks on two different GPUs. Despite the generated code in most cases being equivalent, we observe significant performance differences on the same hardware, ranging from 0.42x to 1.72x in the most extreme cases. We identify the root causes of most of these differences, many of which are due to relatively superficial details such as inconsistent defaults regarding compiler optimisation and numerical accuracy, although a few remain mysterious.
Future Exascale systems will feature massive parallelism, many-core processors and heterogeneous architectures. In this scenario, it is increasingly difficult for HPC applications to fully and efficiently utilize the ...
详细信息
ISBN:
(纸本)9798350387117;9798350387124
Future Exascale systems will feature massive parallelism, many-core processors and heterogeneous architectures. In this scenario, it is increasingly difficult for HPC applications to fully and efficiently utilize the resources in system nodes. Moreover, the increased parallelism exacerbates the effects of existing inefficiencies in current applications. Research has shown that co-scheduling applications to share system nodes instead of executing each application exclusively can increase resource utilization and efficiency. Nevertheless, the current oversubscription and co-location techniques to share nodes have several drawbacks which limit their applicability and make them very application-dependent. This paper presents co-execution through system-wide scheduling. Co-execution is a novel fine-grained technique to execute multiple HPC applications simultaneously on the same node, outperforming current state-of-the-art approaches. We implement this technique in nOS-V, a lightweight tasking library that supports co-execution through system-wide task scheduling. Moreover, nOS-V can be easily integrated with existing programming models, requiring no changes to user applications. We showcase how co-execution with nOS-V significantly reduces schedule makespan for several applications on different scenarios, outperforming prior node-sharing techniques.
We are witnessing a tremendous expansion in computational power, leading to monumental technological advances with various practical applications. The need for parallelism originated from the limitations of single-pro...
详细信息
ISBN:
(纸本)9798350377521;9798350377514
We are witnessing a tremendous expansion in computational power, leading to monumental technological advances with various practical applications. The need for parallelism originated from the limitations of single-processing performance. Increasing the transistor density on integrated circuits was the way to go, thus creating faster and more complex monolithic processors. However, these circuits approached their limits of hardware expansion, so the single-processor model has become unreliable and insufficient overall. This is how multicore architectures were born and, with them, the notion of parallel computing. Our research background in the Drop Computing paradigm and Interest in studying parallel programming paradigms encouraged us to propose a new approach towards parallel processing in mobile, ad-hoc, opportunistic networks. This paper adapts one fundamental message-passing standard for parallel architectures within the Drop Computing context. We define a new library DroMPI. Besides the challenges of parallel programming, the solution has to address the challenges imposed by hardware constraints, limited resources, and the decentralized model specific to Drop Computing.
Stream processing plays a vital role in applications that require continuous, low-latency data processing. Thanks to their extensive parallel processing capabilities and relatively low cost, GPUs are well-suited to sc...
详细信息
ISBN:
(纸本)9783031506833;9783031506840
Stream processing plays a vital role in applications that require continuous, low-latency data processing. Thanks to their extensive parallel processing capabilities and relatively low cost, GPUs are well-suited to scenarios where such applications require substantial computational resources. However, micro-batching becomes essential for efficient GPU computation within stream processing systems. However, finding appropriate batch sizes to maintain an adequate level of service is often challenging, particularly in cases where applications experience fluctuations in input rate and workload. Addressing this challenge requires adjusting the optimal batch size at runtime. This study proposes a methodology for evaluating different self-adaptive micro-batching strategies in a real-world complex streaming application used as a benchmark.
In this work, suitability of using Genetic Algorithms (GA) to solve the Mountain Car problem is investigated. Two variants, involving pure policies, as well as slightly extended mixed policies are considered. Experime...
详细信息
ISBN:
(纸本)9783031562075;9783031562082
In this work, suitability of using Genetic Algorithms (GA) to solve the Mountain Car problem is investigated. Two variants, involving pure policies, as well as slightly extended mixed policies are considered. Experimental results, obtained with CPU-based parallel implementation are presented. They highlight challenges of the GA approach, related to high computational cost required for accurate evaluation of the fitness function.
Educational games have emerged as potent tools for helping students understand complex concepts and are now ubiquitous in global classrooms, amassing vast data. However, there is a notable gap in research concerning t...
详细信息
ISBN:
(纸本)9798400703300
Educational games have emerged as potent tools for helping students understand complex concepts and are now ubiquitous in global classrooms, amassing vast data. However, there is a notable gap in research concerning the effective visualization of this data to serve two key functions: (a) guiding students in reflecting upon their game-based learning and (b) aiding them in analyzing peer strategies. In this paper, we engage educators, students, and researchers as essential stakeholders. Taking a Design-Based Research (DBR) approach, we incorporate UX design methods to develop an innovative visualization system that helps players learn through gaining insights from their own and peers' gameplay and strategies.
This paper introduces Speedcode, an online programming platform that aims to improve the accessibility of software performance-engineering education. At its core, Speedcode provides a platform that lets users gain han...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
This paper introduces Speedcode, an online programming platform that aims to improve the accessibility of software performance-engineering education. At its core, Speedcode provides a platform that lets users gain hands-on experience in software performance engineering and parallel programming by completing short programming exercises. Speedcode challenges users to develop fast multicore solutions for short programming problems and evaluates their code's performance and scalability in a quiesced cloud environment. Speedcode supports parallel programming using OpenCilk, task-parallel computing platform that is open-source and easy to program, teach and use for research. Speedcode aims to reduce barriers to learning and teaching software performance engineering. It allows users to run and evaluate their code on modern multicore machines from their own computer without installing any software. This provides users an easy introduction to the topic, and enables teachers to more easily incorporate lessons on software performance engineering into their courses without incurring the onerous overhead of needing to setup computing environments for their students.
Model-based systems engineering (MBSE) is a methodology that entails creating and utilizing models across the entire system development lifecycle. Based on the Unified Modeling Language (UML), Systems Modeling Languag...
详细信息
ISBN:
(纸本)9798350387568;9798350387575
Model-based systems engineering (MBSE) is a methodology that entails creating and utilizing models across the entire system development lifecycle. Based on the Unified Modeling Language (UML), Systems Modeling Language (SysML) is developed to facilitate intricate industrial systems' behavioral description and design. Open Computing Language (OpenCL) has emerged as a pivotal tool for conceptualizing intricate device functionalities. It has been introduced into FPGA design to overcome the inefficiencies of traditional HDL design methodologies and the inability of design methodologies using High-level behavioral description in C/C++ to design the circuits. The study aims to streamline the transformation process from high-level SysML specifications to executable OpenCL code, thereby facilitating the implementation of complex systems. The paper introduces a data pipelining and a task parallelism approach for mapping high-level SysML specifications onto an OpenCL platform model. A detailed case study is presented to demonstrate the effectiveness of the proposed approach in the context of a real-time three-dimensional particle tracking velocimetry (3D PTV) system. The proposed parallel programming approach converts the comprehensive SysML model of the PTV system into executable OpenCL code. This research applies to multiple applications using the open-source modeling and formal verification tool TTool.
Over the last two decades, parallelism has become the primary method for speeding up computer programs. When writing parallel code, it is often necessary to use synchronization primitives (e.g., atomics, barriers, or ...
详细信息
ISBN:
(纸本)9798350356045;9798350356038
Over the last two decades, parallelism has become the primary method for speeding up computer programs. When writing parallel code, it is often necessary to use synchronization primitives (e.g., atomics, barriers, or critical sections) to enforce correctness. However, the performance of synchronization primitives depends on a variety of complex factors that non-experts may be unaware of. Since multiple primitives can typically be used to complete the same task, choosing the best is often non-trivial. In this paper, we study the performance impact of these factors by measuring the throughput of OpenMP and CUDA synchronization primitives along multiple dimensions. We highlight interesting and non-intuitive behavior that software developers should be aware of when writing parallel programs.
暂无评论