The Forest Fire Weather Index allows the assessment of fire danger using weather variables in order to increase preparedness to prevent or halt the spread of wildfires. It often needs to be computed over large areas, ...
详细信息
ISBN:
(数字)9789532330991
ISBN:
(纸本)9781728153391
The Forest Fire Weather Index allows the assessment of fire danger using weather variables in order to increase preparedness to prevent or halt the spread of wildfires. It often needs to be computed over large areas, taking weather data from hundreds of thousands of stations. CUDA parallel programming can be used to do this more efficiently. This paper presents a CPU and a GPU version as a solution to this problem, using historic datasets of wildfires and weather in the US to measure performance.
parallel computing is one of the top priorities in computer science. The main means of parallel processing information is a distributed computing system (CS)-a composition of elementary machines that interact through ...
详细信息
With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling mallea...
详细信息
ISBN:
(纸本)9781665422925;9781665446501
With the growing constraints on power budget and increasing hardware failure rates, the operation of future exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been actively researched in the HPC community. Malleable jobs can change their computing resources at runtime and can significantly improve HPC system performance. However, due to the rigid nature of popular parallel programming paradigms such as MPI and lack of support for dynamic resource management in batch systems, malleable jobs have been largely unrealized. In this paper, we extend the SLURM batch system to support the execution and batch scheduling of malleable jobs. The malleable applications are written using a new adaptive parallel paradigm called Invasive MPI which extends the MPI standard to support resource-adaptivity at runtime. We propose two malleable job scheduling strategies to support performance-aware and power-aware dynamic reconfiguration decisions at runtime. We implement the strategies in SLURM and evaluate them on a production HPC system. Results for our performance-aware scheduling strategy show improvements in makespan, average system utilization, average response, and waiting times as compared to other scheduling strategies. Moreover, we demonstrate dynamic power corridor management using our power-aware strategy.
Bot computing using the power of Wiki collaboration and an experimental implementation of the bot running environment are discussed. While botnets are usually created for malicious purposes, the bot computing in this ...
详细信息
Due to the challenges in providing adequate memory access to many cores on a single processor, Multi-Die and Multi-Socket based multicore systems are becoming mainstream. These systems offer cache-coherent Non-Uniform...
详细信息
Due to the challenges in providing adequate memory access to many cores on a single processor, Multi-Die and Multi-Socket based multicore systems are becoming mainstream. These systems offer cache-coherent Non-Uniform Memory Access (NUMA) across several memory banks and cache hierarchy to increase memory capacity and bandwidth. Random work-stealing is a widely used technique for dynamic load balancing of tasks on multicore processors. However, it scales poorly on such NUMA systems for memory-bound applications due to cache misses and remote memory access latency. Hierarchical Place Tree (HPT) [1] is a popular approach for improving the locality of a task-based parallel programming model, albeit it requires the programmer to map the dynamically unfolding tasks over a NUMA system evenly. Specifying data-affinity hints provides a more natural way to map the tasks than HPT. Still, a scalable work-stealing implementation for the same is mostly unexplored for modern NUMA systems. This paper presents PufferFish, a new async-finish parallel programming model and work-stealing runtime for NUMA systems that provide a close coupling of the data-affinity hints provided for an asynchronous task with the HPTs in Habanero C/C++ library (HClib). PufferFish introduces Hierarchical Elastic Tasks (HET) that improves the locality by shrinking itself to run on a single worker inside a place or puffing up across multiple workers depending on the work imbalance at a particular place in an HPT. We use a set of widely used memory-bound benchmarks exhibiting regular and irregular execution graphs for evaluating PufferFish. On these benchmarks, we show that PufferFish achieves a geometric mean speedup of 1.5× and 1.9× over HPT implementation in HClib and random work-stealing in CilkPlus, respectively, on a 32-core NUMA AMD EPYC processor.
This paper investigates using compiler technology to automatically convert sequential C++ data abstractions, e.g., queues, stacks, maps, and trees, to concurrent lock-free implementations. By automatically tailoring a...
详细信息
In recent years, multicore shared memory architectures have become more and more powerful. To effectively use such machines, many frameworks are available, including OpenMP and Intel threading building blocks (TBB). S...
详细信息
HPC systems having accelerator attached to it is the new normal. However, programming these accelerators to get good performance is very complex and tedious. Hence, directive based programming such as OpenMP and OpenA...
详细信息
ISBN:
(数字)9781728192192
ISBN:
(纸本)9781728192208
HPC systems having accelerator attached to it is the new normal. However, programming these accelerators to get good performance is very complex and tedious. Hence, directive based programming such as OpenMP and OpenACC are gaining wide popularity for parallel programming. They simplify the programming experience by abstracting the low-level complexities from the user. In this paper, we have done an extensive comparison of OpenMP 4.5 and OpenACC for GPU programming. Performance comparison of these two APIs on NVIDIA Tesla GPUs namely, P100 and V100 has also been captured. Data Transfer times, Kernel Execution times, Total Execution times and Performance portability are the criteria for comparison. The challenges faced while parallelizing the applications using the directives thus leading to improper outputs has also been dotted.
parallel programming is a computing model in which the computations are run on multiple processors simultaneously. In this work, a parallel computing system is implemented through the network connection of a set of Ra...
详细信息
The article explores the possibility of computing parallel data compression using cubic spline. For example, ways to parallel the process of digital processing of seismic signals have been considered. The main perform...
详细信息
ISBN:
(数字)9781728173863
ISBN:
(纸本)9781728173870
The article explores the possibility of computing parallel data compression using cubic spline. For example, ways to parallel the process of digital processing of seismic signals have been considered. The main performance indicators of parallel algorithms have been compared with consecutive algorithms. Spline methods are a versatile signal processing tool. It is more accurate than other mathematical methods, information equality is faster, and maintenance costs are much lower. On the other hand, the equipment used in such systems must also meet high performance requirements. To achieve high speeds, parallel algorithms were developed using OpenMP and MPI technologies and implemented in the architecture of multi-core processors. A mathematical method for the parallel calculation of the coefficients of a cubic spline has been developed and a parallel signal processing algorithm has been developed on its basis. As an example, parallelization is a computation during seismic signal processing. The main indicators of efficiency and acceleration of the parallel algorithm were compared with the sequential algorithm. Explained the relevance of the use of parallel numerical systems, described the main approaches to the distribution of processes and methods of data processing, described the principles of parallel programming technology, studied the basic parameters of parallel algorithms for the initial calculation of the numerical value of cubic spline. The parallel algorithm considered for constructing the cubic spline of defect 1 as p - > n leads to the construction of a local cubic spline on each grid interval ω.
暂无评论