The proceedings contain 56 papers. The special focus in this conference is on Models, Algorithms, Energy Aspects of Computation, Scheduling for parallel Computing and Language-Based parallel Programming Models. The to...
ISBN:
(纸本)9783319321516
The proceedings contain 56 papers. The special focus in this conference is on Models, Algorithms, Energy Aspects of Computation, Scheduling for parallel Computing and Language-Based parallel Programming Models. The topics include: Virtualizing CUDA enabled GPGPUS on arm clusters;a distributed hash table for shared memory;mathematical approach to the performance evaluation of matrix multiply algorithm;a scalable numerical algorithm for solving tikhonov regularization problems;energy performance modeling with TIA and EML;considerations of computational efficiency in volunteer and cluster computing;parallel programs scheduling with architecturally supported regions;adaptive multi-level workflow scheduling with uncertain task estimates;divisible loads scheduling in hierarchical memory systems with time and energy constraints;extending Gustafson-barsis’s law for dual-architecture computing;free scheduling of tiles based on the transitive closure of dependence graphs;multi-threaded construction of neighbour lists for particle systems in openMP;high productivity and high performance;parallel ant brood graph partitioning in Julia;scalability model based on the concept of granularity;performance and power-aware modeling of MPI applications for cluster computing;running time prediction for web search queries;performance analysis of a parallel, multi-node pipeline for DNA sequencing;parallelising the computation of minimal absent words;modeling and simulations of edge-emitting broad-area semiconductor lasers and amplifiers;application of the parallel inmost platform to subsurface flow and transport modelling;genetic algorithm and exact diagonalization approach for molecular nanomagnets modelling and parallel Monte Carlo simulations for spin models with distributed lattice.
The ever-increasing gap between the processor and main memory speeds requires careful utilization of the limited memory link. This is additionally emphasized for the case of memory-bound applications. Prioritization o...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
The ever-increasing gap between the processor and main memory speeds requires careful utilization of the limited memory link. This is additionally emphasized for the case of memory-bound applications. Prioritization of memory requests in the memory controller is one of the approaches to improve performance of such codes. However, current designs do not consider high-level information about parallelapplications. In this paper, we propose a holistic approach to this problem, where the runtime system-level knowledge is made available in hardware. Processor exploits this information to better prioritize memory requests, while introducing negligible hardware cost. Our design is based on the notion of critical path in the execution of a parallel code. The critical tasks are accelerated by prioritizing their memory requests within the on-chip memory hierarchy. As a result, we reduce the critical path and improve the overall performance up to 1.19 x compared to the baseline systems.
We consider the problem of partitioning applications that operate on a regular grid but have irregular boundaries for a cache-coherent multiprocessor. Domain decomposition techniques such as RSB have commonly been use...
详细信息
We consider the problem of partitioning applications that operate on a regular grid but have irregular boundaries for a cache-coherent multiprocessor. Domain decomposition techniques such as RSB have commonly been used to reduce interprocessor communication in message passing multiprocessors. We apply these partitioning algorithms on cache-coherent multiprocessors to reduce cache-coherency traffic. We find that the actual cache-coherency traffic is approximately double the estimated true coherency traffic, primarily due to false-sharing and the consequent false coherency traffic. We devise two techniques that eliminate false sharing traffic in partitions produced using the common domain decomposition algorithms. In our compensation algorithm, we modify the partition produced by the domain decomposition to ensure that all the nodes on a cache line are assigned to the same processor. In our coalescing algorithm, nodes belonging to the same cache line are coalesced into a single node and the weights on nodes and arcs adjusted to represent the overall computation and communication costs of the coalesced nodes. This coalesced graph is partitioned using a domain decomposition algorithm and then the coalesced nodes in the partition are expanded. Our experimental results using an Indian Ocean circulation application on the KSR1 multiprocessor demonstrate that compensation reduces coherency traffic by as much as 55% and execution time by up to 18% and that graph coalescing reduces coherency traffic by up to 74%.
The large-scale penetration of distributed photovoltaic (PV) power generation systems has brought new challenges to the topology identification and detection of traditional distribution networks. This article mainly s...
详细信息
ISBN:
(纸本)9798350375145;9798350375138
The large-scale penetration of distributed photovoltaic (PV) power generation systems has brought new challenges to the topology identification and detection of traditional distribution networks. This article mainly studies the topology identification technology (TIT) of distributed PV low-voltage (LV) distribution network lines, aiming to design a topology identification method that can adapt to the dynamic changes of the power grid, have large-scale capacity, and improve system accuracy under the same conditions. At the same time, the data processing speed has been improved. This article first constructs a system model, including node model, edge model, and parameter model. It mathematically represents the topology structure of the power grid using graph theory and designs a topology recognition algorithm based on optimization techniques and state estimation. This algorithm is used to solve the distributed characteristics of power grid topology recognition, the difficulty of data collection, the dynamic diversity of power grid structure, the uncertainty of equipment parameters, the high computational complexity of data processing, and the communication constraints of power grid topology recognition. This algorithm adopts modern programming languages and parallel computing frameworks, making it easy to implement efficiently. The results on the simulation platform show that the highest recall rate for 22 test cases is 93.8%, and the response time for test cases is 425 ms to 980 ms, providing a fast response to the information space of the grid.
The paper introduces an innovative time-dependent finite impulse response (FIR) filter designed for utilization in time-mode signal processing systems like mobile phones and digital transmitters-receivers. The filter&...
详细信息
Deep Learning Neural Networks (DLNN) require an immense amount of computation, especially in the training phase when multiple layers of intermediate neurons need to be built. The situation is even more dramatic today ...
详细信息
ISBN:
(纸本)9798400718021
Deep Learning Neural Networks (DLNN) require an immense amount of computation, especially in the training phase when multiple layers of intermediate neurons need to be built. The situation is even more dramatic today with the proliferation of applications with intelligence at the edge, not just in the cloud. Therefore, to meet the new requirements of edge computing, it is imperative to accelerate the execution phases of neural networks as much as possible. In this paper, we will focus on the algorithm known as Particle Swarm Optimization (PSO). It is a bio-inspired, stochastic optimization approach whose goal is to iteratively improve the solution to a given (usually complex) problem by attempting to approximate a given objective. The use of PSO in an edge computing environment has the potential to make the training of the DLNN there without the need to transfer resource-intensive tasks to the cloud. However, implementing an efficient PSO is not a straightforward process due to the complexity of the computations performed on the swarm of particles and the iterative execution until until a near-target solution with minimal error is achieved. In the present work, two parallelizations of the PSO algorithm have been implemented, both designed for a distributed execution environment (Apache Spark). The first PSO parallelization follows a synchronous scheme;i.e., the best global position found by particles is globally updated before the execution of the next iteration of the algorithm. This implementation proved to be more efficient for medium-sized datasets (<40000 data points). In contrast, the second implementation is an asynchronous parallel variant of the PSO algorithm, which showed lower execution time for large datasets (> 170,000 data points) compared to the first one. Additionally, it exhibits better scalability and elasticity with respect to increasing dataset size. Both variants of the PSO have been implemented to distribute the computational load (particle fit
The rapid development of distributed systems has triggered the emergence of many new applications such as Cloud applications. Satisfaction on these systems in regards their services is an important indicator that refl...
详细信息
This paper presents an efficient distributed algorithm for line-of-sight (LoS) computation using directed breadth-first search (DBFS). LoS computation could be achieved between objects located at different points of a...
详细信息
This paper reports a prototype implementation of 3D Fluid-Structure Simulation. In this implementation, heartbeat fluid flows inside a rectangular tube where the operator can interactively change the width of the tube...
详细信息
The proceedings contain 21 papers from conference on Commercial applications for High-Performance computing. Topics discussed include resource allocation framework for interactive television environments;resource mana...
详细信息
The proceedings contain 21 papers from conference on Commercial applications for High-Performance computing. Topics discussed include resource allocation framework for interactive television environments;resource management in peer-to-peer and grid computing;automatic parallelization of large-scale computational applications;Finite Difference Time Domain (FDTD) techniques on supercomputers;Monte Carlo simulation of multilattice thin film growth;data mining;search engines;industrial applications of high-performance computing for phylogeny reconstructing and parallel Fast Fourier Transform (FFT) approach for derivative pricing.
暂无评论