The staggered-grid finite-difference (SGFD) method is one of the most popular means used in seismic numerical modelling because of its computational efficiency and it is easy to implement. However, it will lead to ser...
详细信息
Resistive random access memory (ReRAM) addresses the high memory bandwidth requirement challenge of graph analytics by integrating the computing logic in the memory. Due to the matrix-structured crossbar architecture,...
详细信息
ISBN:
(纸本)9781728168760
Resistive random access memory (ReRAM) addresses the high memory bandwidth requirement challenge of graph analytics by integrating the computing logic in the memory. Due to the matrix-structured crossbar architecture, existing ReRAM-based accelerators, when handling real-world graphs that often have the skewed degree distribution, suffer from the severe sparsity problem arising from zero fillings and activation nondeterminism, incurring substantial ineffectual computations. In this paper, we observe that the sparsity sources lie in the consecutive mapping of source and destination vertex index onto the wordline and bitline of a crossbar. Although exhaustive graph reordering improves the sparsity-induced inefficiency, its totally-random (source and destination) vertex mapping leads to expensive overheads. This work exploits the insight in a mid-point vertex mapping with the random wordlines and consecutive bitlines. A cost-effective preprocessing is proposed to exploit the insight by rapidly exploring the crossbar-fit vertex reorderings but ignores the sparsity arising from activation dynamics. We present a novel ReRAM-based graph analytics accelerator, named Spara, which can maximize the workload density of crossbars dynamically by using a tightly-coupled bank parallel architecture further proposed. Results on real-world and synthesized graphs show that Spara outperforms GraphR and GraphSAR by 8.21x and 5.01x in terms of performance, and by 8.97x and 5.68x in terms of energy savings (on average), while incurring a reasonable (<9.98%) pre-processing overhead.
Micro network is one of the important forms of distributed generation, either through micro network distribution network with large power grid and run, forming a large power grid and the joint operation of power grid ...
详细信息
Integration of solar photovoltaic (PV) with AC grid is gaining more popular in distributed generation. In future, DC grid is likely to play a major role in the distribution system. With this in view the present invest...
详细信息
With the increased usage of deep neural networks, their structures have naturally evolved, increasing in size and complexity. With currently used networks often containing millions of parameters, and hundreds of layer...
详细信息
ISBN:
(纸本)9783030410322;9783030410315
With the increased usage of deep neural networks, their structures have naturally evolved, increasing in size and complexity. With currently used networks often containing millions of parameters, and hundreds of layers, there have been many attempts to leverage the capabilities of various high-performance computing architectures. Most approaches are focused on either using parameter servers or a fixed communication network, or exploiting particular capabilities of specific computational resources. However, few experiments have been made under relaxed communication consistency requirements and using a dynamic adaptive way of exchanging information. Gossip communication is a peer-to-peer communication approach, that can minimize the overall data traffic between computational agents, by providing a weaker guarantee on data consistency - eventual consistency. In this paper, we present a framework for gossip-based communication, suitable for heterogeneous computing resources, and apply it to the problem of parallel deep learning, using artificial neural networks. We present different approaches to gossip-based communication in a heterogeneous computing environment, consisting of CPUs and MIC-based co-processors, and implement gossiping via both shared and distributed memory. We also provide a simplistic approach to load balancing in a heterogeneous computing environment, that proves efficient for the case of parallel deep neural network training. Further, we explore several approaches to communication exchange and resource allocation, when considering parallel deep learning using heterogeneous computing resources, and evaluate their effect on the convergence of the distributed neural network.
With the increased popularity in the paralleling of voltage sources and inverters, there is a need for simpler yet effective ways of controlling such a system. In this regard a new control scheme using Virtual Oscilla...
详细信息
A computational grid is a distributed network of heterogeneous resources dedicated to execute the user-defined jobs in a more efficient manner. Therefore, a load-balancing scheme is always required in a gridcomputing...
详细信息
Federated Learning (FL) has been a promising paradigm in distributed machine learning that enables in-situ model training and global model aggregation. While it can well preserve private data for end users, to apply i...
详细信息
ISBN:
(纸本)9781450397339
Federated Learning (FL) has been a promising paradigm in distributed machine learning that enables in-situ model training and global model aggregation. While it can well preserve private data for end users, to apply it efficiently on IoT devices yet suffer from their inherent variants: their available computing resources are typically constrained, heterogeneous, and changing dynamically. Existing works deploy FL on IoT devices by pruning a sparse model or adopting a tiny counterpart, which alleviates the workload but may have negative impacts on model accuracy. To address these issues, we propose Eco-FL, a novel Edge Collaborative pipeline based Federated Learning framework. On the client side, each IoT device collaborates with trusted available devices in proximity to perform pipeline training, enabling local training acceleration with efficient augmented resource orchestration. On the server side, Eco-FL adopts a novel grouping-based hierarchical architecture that combines synchronous intra-group aggregation and asynchronous inter-group aggregation, where a heterogeneity-aware dynamic grouping strategy that jointly considers response latency and data distribution is developed. To tackle the resource fluctuation during the runtime, Eco-FL further applies an adaptive scheduling policy to judiciously adjust workload allocation and client grouping at different levels. Extensive experimental results using both prototype and simulation show that, compared to state-of-the-art methods, Eco-FL can upgrade the training accuracy by up to 26.3%, reduce the local training time by up to 61.5%, and improve the local training throughput by up to 2.6 ×.
To deal with the deep penetration of renewable energy in the power grid and energy coupling in the distribution and consumption part, improve energy efficiency, and construct the regional integrated energy system (IES...
详细信息
To deal with the deep penetration of renewable energy in the power grid and energy coupling in the distribution and consumption part, improve energy efficiency, and construct the regional integrated energy system (IES) with multiple self-sufficient cells, energy cell has attracted widespread interest. However, the different management systems, diversified energy features, and information barriers increase the difficulties of unified energy flow calculation of IES. Based on the theory, this paper proposes a paralleldistributedcomputing method for multiregion IES with electricity network being considered as a dividing surface. First of all, the basic principle of energy cell is proposed and the large-scale IES is divided into several energy cells according to the energy balance ability and geographic location. Then virtual nodes are added at the boundaries of the energy cells and the adjustment factors are introduced to ensure system convergence. Moreover, the extended Jacobian matrix was constructed to calculate muti-energy flow (MEF) in a single energy cell and decoupled energy cells can be computed in parallel through the limited information exchange of virtual nodes. Finally, two cases are utilized to verify the performances of the proposed energy method. Comparing with the integrated method and the decomposed method, the proposed method in this paper has better abilities in calculation efficiency and accuracy.
Spark iswidely used as a distributedcomputing framework for in-memory parallel processing. It implements distributedcomputing by splitting the jobs into tasks and deploying them on executors on the nodes of a cluste...
详细信息
ISBN:
(纸本)9789811500299;9789811500282
Spark iswidely used as a distributedcomputing framework for in-memory parallel processing. It implements distributedcomputing by splitting the jobs into tasks and deploying them on executors on the nodes of a cluster. Executors are JVMs with dedicated allocation of CPU cores and memory. The number of tasks depends on the partitions of input data. Depending on the number of CPU cores allocated to executors, one or more cores get allocated to one task. Tasks run as independent threads on executors hosted on JVMs dedicated exclusively to the executor. One or more executors are deployed on the nodes of the cluster depending on the resource availability. The performance advantage provided by distributedcomputing on Spark framework depends on the level of parallelism configured at 3 levels, namely node level, executor level, and task level. The parallelism at each of these levels should be configured to fully utilize the available computing resources. This paper recommends optimum parallelism configuration for Apache Spark framework deployed on Hadoop YARN cluster. The recommendations are based on the results of the experiments conducted to evaluate the dependency of parallelism at each of these levels on the performance of Spark applications. For the purpose of the evaluation, a CPU-intensive job and an I/O-intensive job are used. The performance is measured by varying the parallelism at each of the 3 levels. The results presented in this paper help Spark users in selecting optimum parallelism at each of these levels for achieving maximum performance for Spark jobs by maximum resource utilization.
暂无评论