Recently, graph edge partitioning has shown better partitioning quality than the vertex graph partitioning for the skewed degree distribution of real-world graph data. Graph edge partitioning can be classified as stre...
详细信息
ISBN:
(数字)9783030546236
ISBN:
(纸本)9783030546229;9783030546236
Recently, graph edge partitioning has shown better partitioning quality than the vertex graph partitioning for the skewed degree distribution of real-world graph data. Graph edge partitioning can be classified as stream and offline. The stream edge partitioning approach supports a big graph partitioning;however, it has lower partitioning quality, is affected by stream order, and it has taken much time to make partitioning compared with the offline edge partitioning. Conversely, the offline edge partitioning approach has better partitioning quality than stream edge partitioning;however, it does not support big graph partitioning. In this study, we propose partial stream hybrid graph edge partitioning OffStreamNG, which leverages the advantage of both offline and stream edge partitioning approaches by interconnecting via saved partition state layer. The OffStreamNG holds vertex and load states as partition state, while the offline component is partitioning using neighborhood expansion heuristic. And it is transferring this partition state to the online component of Greedy heuristic with minor modification of both algorithms. Experimental results show that OffStreamNG achieves attractive results in terms of replication factor, load balance, and total partitioning time.
The tremendous growth of data being generated today is making storage and computing a mammoth task. With its distributed processing capability Hadoop gives an efficient solution for such large data. Hadoop's defau...
详细信息
The tremendous growth of data being generated today is making storage and computing a mammoth task. With its distributed processing capability Hadoop gives an efficient solution for such large data. Hadoop's default data placement strategy places the data blocks randomly across the nodes without considering the execution parameters resulting in several lacunas such as increased execution time, query latency etc., Also, most of the data required for a task execution may not be locally available which creates data-locality problem. Hence we propose an innovative data placement strategy based on dependency of data blocks across the nodes. Our strategy dynamically analyses the history log and establishes relationship between various tasks and blocks required for each task through Block Dependency Graph (BDG). Then Our CORE-Algorithm re-organizes the HDFS layout by redistributing the data blocks to give an optimal data placement, resulting in improved performance for Big Data sets in distributed environment. This strategy is tested in 20-node cluster with different real-world MR applications. The results conclude that proposed strategy reduces the query execution time by 23%, improves the data locality by 50.7%, compared to default.
In this work, we propose a robust and efficient resource allocation scheme for UAV-enabled cellular networks that aid in disaster communications. To recover the network within a disaster area, a fast user clustering m...
详细信息
ISBN:
(纸本)9781538665282
In this work, we propose a robust and efficient resource allocation scheme for UAV-enabled cellular networks that aid in disaster communications. To recover the network within a disaster area, a fast user clustering model based on K-means procedure and distributed control power coefficient will be proposed and can be embedded in the real system by using UAV-assisted relaying for real-time recovering and maintaining network during and after disasters. Algorithms of low computational complexity and fast convergence are also proposed. Numerical examples are provided to demonstrate the benefit of the proposed computational approach.
Deep learning is a popular machine learning technique and has been applied to many real-world problems, ranging from computer vision to natural language processing. However, training a deep neural network is very time...
详细信息
ISBN:
(纸本)9781728125190
Deep learning is a popular machine learning technique and has been applied to many real-world problems, ranging from computer vision to natural language processing. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter server framework. In this paper, we present a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous parallel (DSSP) which improves the state-of-the-art Stale Synchronous parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time. Conventionally to run distributed training in SSP, the user needs to specify a particular stalenes threshold as a hyper-parameter. However, a user does not usually know how to set the threshold and thus often finds a threshold value through trial and error, which is time-consuming. Based on workers' recent processing time, our approach DSSP adaptively adjusts the threshold per iteration at running time to reduce the waiting time of faster workers for synchronization of the globally shared parameters (the weights of the model), and consequently increases the frequency of parameters updates (increases iteration throughput), which speedups the convergence rate. We compare DSSP with other paradigms such as Bulk Synchronous parallel (BSP), Asynchronous parallel (ASP), and SSP by running deep neural networks (DNN) models over GPU clusters in both homogeneous and heterogeneous environments. The results show that in a heterogeneous environment where the cluster consists of mixed models of GPUs, DSSP converges to a higher accuracy much earlier than SSP and BSP and performs similarly to ASP. In a homogeneous distributed cluster, DSSP has more stable and slightly better performance than SSP and ASP, and converges much faster than BSP.
The present paper describes a new paralleltime-domain simulation algorithm using a high performance computing environment - Julia - for the analysis of power system dynamics in large networks. The parallel algorithm ...
详细信息
ISBN:
(数字)9781728173436
ISBN:
(纸本)9781728173443
The present paper describes a new paralleltime-domain simulation algorithm using a high performance computing environment - Julia - for the analysis of power system dynamics in large networks. The parallel algorithm adapts a parallel-in-space decomposition scheme to a previously sequential algorithm in order to develop a new parallelizable numerical solution of the power system equations. The parallel-in-space decomposition is based on the block bordered diagonal form, which reformulates the network admittance matrix into sub-blocks that can be solved in parallel. For the optimal spatial decomposition of the network, a new extended graph partitioning strategy is developed for load balancing and minimizing the communication between subnetworks. The new parallel simulation algorithm is tested using standard test networks of varying complexity. The simulation results are compared to those obtained from a sequential implementation in order to validate the solution accuracy and to determine the performance improvement in terms of computational speedup. Test simulations are conducted using the ForHLR II supercomputing cluster and show a huge potential in computational speedup with increasing network complexity.
High power grid-tied inverters have attracted increasing attentions in energy conversion and electrical drives applications. parallel configuration is a promising solution to enlarge the total power rating. However, z...
详细信息
ISBN:
(纸本)9781665425582
High power grid-tied inverters have attracted increasing attentions in energy conversion and electrical drives applications. parallel configuration is a promising solution to enlarge the total power rating. However, zero-sequence loop arises in paralleled grid-tied inverters, hence introducing zero-sequence circulating current. This leads to the deceases of the effective power volume and additional power losses. An effective solution based on predictive control framework is proposed in this work, and it successfully eliminating the circulating current. The proposed method decomposes the voltage vector into two parts, i.e., the former is used for reference current tracking, and the latter is responsible to suppress the circulating current. It is straightforward to be implemented in low-cost processors, and flexible to balance the importance of control performance and bus voltage utilization. We validate the controller on a lab-constructed test bench. The experimental results confirm the effectiveness and robustness of our proposed solution.
This paper investigates on the optimal power allocation and load balancing problem encountered by heterogeneous and distributed embedded systems with mixed tasks. Given that each node has real and different urgent tas...
详细信息
This paper investigates on the optimal power allocation and load balancing problem encountered by heterogeneous and distributed embedded systems with mixed tasks. Given that each node has real and different urgent tasks in the majority of practical heterogeneous embedded systems, three priority disciplines are considered: dedicated jobs without priority, prioritized dedicated jobs without preemption, and prioritized dedicated jobs with preemption. A model is established for heterogeneous embedded processors with dedicated-task-dependent dynamic power and load balancing management;each processor is considered as an M/M/1 queueing sub-model with mixed generic and dedicated tasks. The processors have different levels of power consumption, and each one can employ any of the three disciplines. The objective of this study is to find an optimal load balancing (for generic tasks) and power allocation strategy for heterogeneous processors preloaded by different amounts of dedicated tasks such that the average response time of generic tasks is minimized. Considering that this problem is a multi-constrained, multi-variable optimization problem for which a closed-form solution is unlikely to be obtained, we propose an optimal power allocation and load balancing scheme by employing Lagrange method and binary search approach, which are completed by utilizing two new rules established by observing numerical variations of parameters. Several numerical examples are presented to demonstrate the effectiveness of our solution. To the best of our knowledge, this is the first work on analytical study that combines load balancing, energy efficiency, and priority of tasks in heterogeneous and distributed embedded systems. (C) 2019 Elsevier Inc. All rights reserved.
Frequent Itemsets Mining is a fundamental mining model in Data Mining. It supports a vast range of application fields and can be employed as a key calculation phase in many other mining models such as Association Rule...
详细信息
Frequent Itemsets Mining is a fundamental mining model in Data Mining. It supports a vast range of application fields and can be employed as a key calculation phase in many other mining models such as Association Rules, Correlations, Classifications, etc. Many distributedparallel algorithms have been introduced to confront with very large-scale datasets of Big Data. However, the problems of running time and memory scalability still have not had adequate solutions for very large and "hard-to-mined" datasets. In this paper, we propose a distributedparallel algorithm named DP3 (distributed PrePostPlus) which parallelizes the state-of-the-art algorithm PrePost(+) and operates in Master-Slaves model. Slave machines mine and send local frequent itemsets and support counts to the Master for aggregations. In the case of tremendous numbers of itemsets transferred between the Slaves and Master, the computational load at the Master, therefore, is extremely heavy if there is not the support from our complete FPO tree (Frequent Patterns Organization) which can provide optimal compactness for light data transfers and highly efficient aggregations with pruning ability. Processing phases of the Slaves and Master are designed for memory scalability and shared-memory parallel in Work-Pool model so as to utilize the computational power of multi-core CPUs. We conducted experiments on both synthetic and real datasets, and the empirical results have shown that our algorithm far outperforms the well-known PFP and other three recently high-performance ones Dist-Eclat, BigFIM, and MapFIM. (C) 2019 Elsevier Ltd. All rights reserved.
暂无评论