Big graphs are part of the movement of "Not Only SQL" databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edg...
详细信息
Big graphs are part of the movement of "Not Only SQL" databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graphalgorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graphpartitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP.
graph partitioning algorithms have been utilized to execute complex applications, where there is no enough space to run the whole application once, like in limited reconfigurable computing resources. If we have found ...
详细信息
ISBN:
(纸本)9781665433655
graph partitioning algorithms have been utilized to execute complex applications, where there is no enough space to run the whole application once, like in limited reconfigurable computing resources. If we have found an "optimal" clustering of a data set, it can be proved that optimal partitioning can be achieved. K-means based algorithms are widely used to partition subjects where there is no information about the number of clusters. A vital issue in the mentioned method is how to define a good centroid, which has the principal role in "good" clustering. In this paper, we introduced a new way to determine purposive centroids, based on Binomial Distribution to reduce the risk of randomly seeds selection, Elbow Diagram to achieve the optimum number of clusters, and finally, Bin Packing to classify nodes in defined clusters with considering Utilization Factor (UF) due to the limited area of Run Space. The proposed algorithm, called Binomial Distribution based K-means (BDK), is compared with common graph partitioning algorithms like Simulated Annealing Algorithm (SA), Density K-means (DK), and a link elimination partitioning with different scenarios such as simple and complex applications. The concluding results show that the proposed algorithm decreases the error of partitioning by 24% compared to the other clustering techniques. On the other hand, the Quality Factor (QF) is increased 41% in this way. Execution Time (EX.T) to achieve the required number of clusters is reduced significantly.
The problem of partitioning a graph such that the number of edges incident to vertices in different partitions is minimized, arises in many contexts. Some examples include its recursive application for minimizing fill...
详细信息
The problem of partitioning a graph such that the number of edges incident to vertices in different partitions is minimized, arises in many contexts. Some examples include its recursive application for minimizing fill-in in matrix factorizations and load-balancing for parallel algorithms. Spectral graph partitioning algorithms partition a graph using the eigenvector associated with the second smallest eigenvalue of a matrix called the graph Laplacian . The focus of this paper is the use graph theory to compute this eigenvector more quickly.
Dealing with large-scale graphs requires an efficient graph partitioner that produces balanced partitions with fewer cut edges/vertices in a reasonable amount of time. Despite several algorithms that have been propose...
详细信息
Dealing with large-scale graphs requires an efficient graph partitioner that produces balanced partitions with fewer cut edges/vertices in a reasonable amount of time. Despite several algorithms that have been proposed, it is still insufficient. Even with the continuous growth of graph volume, they do not consider the graph volume during graphpartitioning. Therefore, these algorithms generate an imbalanced workload. We propose a graph partitioner algorithm VSCT based essentially on four key metrics: Volume, Size, Cuts, and Time to maintain high-quality graphpartitioning. Using real-world datasets, we show that VSCT performs an efficient partitioning quality against the existing graph partitioning algorithms.
In this paper, a partitioning approach for large-scale systems based on graph-theory is presented. The algorithm starts with the translation of the system model into a graph representation. Once the system graph is ob...
详细信息
In this paper, a partitioning approach for large-scale systems based on graph-theory is presented. The algorithm starts with the translation of the system model into a graph representation. Once the system graph is obtained, the problem of graphpartitioning is then solved. The resultant partition consists in a set of non-overlapping subgraphs whose number of vertices is as similar as possible and the number of interconnecting edges between them is minimal. To achieve this goal, the proposed algorithm applies a set of procedures based on identifying the highly connected subgraphs with balanced number of internal and external connections. In order to illustrate the use and application of the proposed partitioning approach, it is used to decompose a dynamical model of the Barcelona drinking water network (DWN). Moreover, a hierarchical-like DMPC strategy is designed and applied over the resultant set of partitions in order to assess the closed-loop performance. Results obtained when used several simulation scenarios show the effectiveness of both the partitioning approach and the DMPC strategy in terms of the reduced computational burden and, at the same time, of the admissible loss of performance in contrast to a centralised MPC strategy. (C) 2010 Elsevier Ltd. All rights reserved.
This paper proposes anew cost function, cut ratio, for segmenting images using graph-based methods. The cut ratio is defined as the ratio of the corresponding sums of two different weights of edges along the cut bound...
详细信息
This paper proposes anew cost function, cut ratio, for segmenting images using graph-based methods. The cut ratio is defined as the ratio of the corresponding sums of two different weights of edges along the cut boundary and models the mean affinity between the segments separated by the boundary per unit boundary length. This new cost function allows the image perimeter to be segmented, guarantees that the segments produced by bipartitioning are connected, and does not introduce a size, shape, smoothness, or boundary-length bias. The latter allows it to produce segmentations where boundaries are aligned with image edges. Furthermore, the cut-ratio cost function allows efficient iterated region-based segmentation as well as pixel-based segmentation. These properties may be useful for some image-segmentation applications. While the problem of finding a minimum ratio cut in an arbitrary graph is NP-hard, one can find a minimum ratio cut in the connected planar graphs that arise during image segmentation in polynomial time. While the cut ratio, alone, is not sufficient as a baseline method for image segmentation, it forms a good basis for an extended method of image segmentation when combined with a small number of standard techniques. We present an implemented algorithm for finding a minimum ratio cut, prove its correctness, discuss its application to image segmentation, and present the results of segmenting a number of medical and natural images using our techniques.
The increasing number of Internet-of-Things (IoT) devices will generate unprecedented data in the upcoming years. Fog computing may prevent the saturation of the network infrastructure by processing data at the edge o...
详细信息
The increasing number of Internet-of-Things (IoT) devices will generate unprecedented data in the upcoming years. Fog computing may prevent the saturation of the network infrastructure by processing data at the edge or within these devices. Consequently, the machine intelligence built almost exclusively on the cloud can be scattered to the edge devices. While deep learning techniques can adequately process IoT-massive data volumes, their high resource-demanding nature poses a trade-off for execution on resource-constrained devices. This paper proposes and evaluates the performance of the partitioning Networks for COnstrained DEvices (PANCODE), a novel algorithm that employs a multilevel approach to partition large convolutional neural networks for distributed execution on constrained IoT devices. Experimental results with the LeNet and AlexNet models show that our algorithm can produce partitionings that achieve up to 2173.53 times more inferences per second than the Best Fit algorithm and up to 1.37 times less communication than the second-best approach. We also show that the METIS state-of-the-art framework only produces invalid partitionings in more constrained setups. The results indicate that our algorithm achieves higher inference rates and low communication costs in convolutional neural networks distributed among constrained and exceptionally very constrained devices.
Service identification - as the first step of service-oriented modeling holds the main emphasis on the modeling process and has a broad influence on the system development. Selecting appropriate service identification...
详细信息
ISBN:
(纸本)9783642251054
Service identification - as the first step of service-oriented modeling holds the main emphasis on the modeling process and has a broad influence on the system development. Selecting appropriate service identification method is essential for prosperity of any service-oriented architecture project. Automation, utilizing middle-out strategy, and quality assess of services, are three important criteria in evaluation of service identification methods. Existing methods mostly ignore automation principles. Meanwhile a few automated and semi-automated methods use top-down strategy to identify services and ignore existing assets of enterprise. Moreover these methods do not take all the quality metrics into account. This paper proposes a novel semi-automated method called 2PSIM (Two-Phase Service Identification Method) which uses graphpartitioning algorithm to identify services based on enterprise business processes as well business entity models. 2PSIM utilizes middle-out strategy and tries to identify reusable services with proper granularity and acceptable level of cohesion and coupling.
We investigate the problem of partitioning finite difference meshes in two dimensions among the processors of a parallel computer. The objective is to achieve a perfect load balance while minimizing the communication ...
详细信息
ISBN:
(纸本)9781479927289
We investigate the problem of partitioning finite difference meshes in two dimensions among the processors of a parallel computer. The objective is to achieve a perfect load balance while minimizing the communication cost. There are well-known graph, hypergraph, and geometry-based partitioningalgorithms for this problem. The known geometric algorithms have linear running time and obtain the best results for very special mesh sizes and processor numbers. We propose another geometric algorithm. The proposed algorithm is linear;is applicable to much more cases than some well-known alternatives;obtains better results than the graph partitioning algorithms;obtains better results than the hypergraph partitioning algorithms almost always. Our algorithm also obtains better results than a known asymptotically-optimal algorithm for some small number of processors. We also catalog related theoretical results.
Fault-tolerant systems rely on recovery techniques to enhance system resilience. In this regard, checkpointing procedures periodically take snapshots of the system state during failure-free operation, enabling recover...
详细信息
ISBN:
(纸本)9798350305487
Fault-tolerant systems rely on recovery techniques to enhance system resilience. In this regard, checkpointing procedures periodically take snapshots of the system state during failure-free operation, enabling recovery processes to resume from a previously saved, consistent state. Saving checkpoints, however, is costly, as it must synchronize snapshots with the processing of incoming requests to avoid inconsistency. One way to speed up checkpointing is to partition the service state, allowing a parallel checkpoint procedure to operate independently on each partition. State partitioning can also improve throughput by increasing parallelism in request processing. However, variations in the data access pattern over time can result in unbalanced partitions, posing a challenge to achieving optimal performance. In this paper, aiming to improve both checkpointing and overall system performance, we combine parallel checkpointing with a dynamic graph-based repartitioning algorithm. This work formalizes the optimization problem and presents a detailed performance assessment of the proposed approach. The experimental evaluation highlights the benefits of parallel checkpointing and emphasizes the performance gains achieved with repartitioning under realistic workloads. Comparing a cost-effective round-robin partitioning approach with our dynamic method, we examine the degree of execution parallelism achieved by checkpointing threads and the influence of repartitioning strategies on checkpoint performance. Although the rebalancing of state partitions incurs a cost, it comes for free in our technique since it takes advantage of processing idleness during the snapshot-taking process.
暂无评论