Arising user-centric graph applications such as route planning and personalized social network analysis have initiated a shift of paradigms in modern graphprocessing systems towards multi-query analysis, i.e., proces...
详细信息
ISBN:
(纸本)9781450356954
Arising user-centric graph applications such as route planning and personalized social network analysis have initiated a shift of paradigms in modern graphprocessing systems towards multi-query analysis, i.e., processing multiple graph queries in parallel on a shared graph. These applications generate a dynamic number of localized queries around query hotspots such as popular urban areas. However, existing graphprocessing systems are not yet tailored towards these properties: The employed methods for graph partitioning and synchronization management disregard query locality and dynamism which leads to high query latency. To this end, we propose the system Q-graph for multi-query graph analysis that considers query locality on three levels. (i) The query-aware graph partitioning algorithm Q-cut maximizes query locality to reduce communication overhead. (ii) The method for synchronization management, called hybrid barrier synchronization, allows for full exploitation of local queries spanning only a subset of partitions. (iii) Both methods adapt at runtime to changing query workloads in order to maintain and exploit locality. Our experiments show that Q-cut reduces average query latency by up to 57 percent compared to static query-agnostic partitioning algorithms.
Recently, distributedprocessing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many d...
详细信息
Recently, distributedprocessing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graphprocessing systems have been proposed, such as Pregel, Powergraph, graphLab, and Trinity. However, these systems deal only with static graphs and do not consider the issue of processing evolving and dynamic graphs. In this paper, we are considering the issues of scale and dynamism in the case of graphprocessing systems. We present BLADYG, a graphprocessing framework that addresses the issue of dynamism in large-scale graphs. We present an implementation of BLADYG on top of AKKA framework. We experimentally evaluate the performance of the proposed framework by applying it to problems such as distributed k-core decomposition and partitioning of large dynamic graphs. The experimental results show that the performance and scalability of BLADYG are satisfying for large-scale dynamic graphs. (C) 2017 Elsevier Inc. All rights reserved.
The increasing popularity and ubiquity of various large graph datasets has caused renewed interest for graph partitioning. Existing graph partitioners either scale poorly against large graphs or disregard the impact o...
详细信息
ISBN:
(纸本)9781467390057
The increasing popularity and ubiquity of various large graph datasets has caused renewed interest for graph partitioning. Existing graph partitioners either scale poorly against large graphs or disregard the impact of the underlying hardware topology. A few solutions have shown that the nonuniform network communication costs may affect the performance greatly. However, none of them considers the impact of resource contention on the memory subsystems (e.g., LLC and Memory Controller) of modern multicore clusters. They all neglect the fact that the bandwidth of modern high-speed networks (e.g., Infiniband) has become comparable to that of the memory subsystems. In this paper, we provide an in-depth analysis, both theoretically and experimentally, on the contention issue for distributed workloads. We found that the slowdown caused by the contention can be as high as 11x. We then design an architecture-aware graph partitioner, ARGO, to allow the full use of all cores of multicore machines without suffering from either the contention or the communication heterogeneity issue. Our experimental study showed (1) the effectiveness of ARGO, achieving up to 12x speedups on three classic workloads: Breadth First Search, Single Source Shortest Path, and PageRank;and (2) the scalability of ARGO in terms of both graph size and the number of partitions on two billion-edge real-world graphs.
Vertex-centric graphprocessing systems such as Pregel, Powergraph, or graphX recently gained popularity due to their superior performance of data analytics on graph-structured data. These systems exploit the graph st...
详细信息
ISBN:
(纸本)9781509014828
Vertex-centric graphprocessing systems such as Pregel, Powergraph, or graphX recently gained popularity due to their superior performance of data analytics on graph-structured data. These systems exploit the graph structure to improve data access locality during computation, making use of specialized graph partitioning algorithms. Recent partitioning techniques assume a uniform and constant amount of data exchanged between graph vertices (i.e., uniform vertex traffic) and homogeneous underlying network costs. However, in real-world scenarios vertex traffic and network costs are heterogeneous. This leads to suboptimal partitioning decisions and inefficient graphprocessing. To this end, we designed graph, the first graphprocessing system using vertex-cut graph partitioning that considers both, diverse vertex traffic and heterogeneous network, to minimize overall communication costs. The main idea is to avoid frequent communication over expensive network links using an adaptive edge migration strategy. Our evaluations show an improvement of 60% in communication costs compared to state-of-the-art partitioning approaches.
Distance join queries have recently been recognized as a particularly useful operation over graph data, since they capture graph similarity in a meaningful way. Consequently, they have been studied extensively in rece...
详细信息
Distance join queries have recently been recognized as a particularly useful operation over graph data, since they capture graph similarity in a meaningful way. Consequently, they have been studied extensively in recent years [1], [2]. However, current methods are designed for centralized systems, and rely on the graph embedding for effective pruning and indexing. As graph sizes become very large and graph data must be deployed in the distributed environment, these techniques become impractical. In this work, we propose a solution for efficient parallel processing of distance join queries over distributed large graphs. There have been emerging efforts devoted to managing large graphs in distributed and parallel systems. Programming models like Pregel [3] and iterative computing framework like HaLoop [4] have been proposed to handle queries over distributedgraphs. However, they are designed in the perspective of functionality instead of the query efficiency. In this work, we define an optimization problem: combining the iterative join and the graph exploration method to minimize the evaluation time of distance join queries. Without sacrificing a system's scalability, our technique exploits a light-weight vertex centric encoding schema built on a distance-aware partition of the entire graph. Extensive experiments over both real and synthetic large graphs show that, by employing an adaptive query plan generation and scheduling method, we can effectively reduce the redundant message passing and I/O costs. Compared to simply using iterative join or graph exploration method, our solution achieves as many as one order of magnitude of time saving for the query evaluation.
In recent years, the proliferation of highly dynamic graph-structured data streams fueled the demand for real-time data analytics. For instance, detecting recent trends in social networks enables new applications in a...
详细信息
ISBN:
(纸本)9781450340212
In recent years, the proliferation of highly dynamic graph-structured data streams fueled the demand for real-time data analytics. For instance, detecting recent trends in social networks enables new applications in areas such as disaster detection, business analytics or health-care. Parallel Complex Event processing has evolved as the paradigm of choice to analyze data streams in a timely manner, where the incoming data streams are split and processed independently by parallel operator instances. However, the degree of parallelism is limited by the feasibility of splitting the data streams into independent parts such that correctness of event processing is still ensured. In this paper, we overcome this limitation for graph-structured data by further parallelizing individual operator instances using modern graphprocessing systems. These systems partition the graph data and execute graph algorithms in a highly parallel fashion, for instance using cloud resources. To this end, we propose a novel graph-based Complex Event processing system graphCEP and evaluate its performance in the setting of two case studies from the DEBS Grand Challenge 2016.
These days, large-scale graphprocessing becomes more and more important. Pregel, inspired by Bulk Synchronous Parallel, is one of the highly used systems to process large-scale graph problems. In Pregel, each vertex ...
详细信息
These days, large-scale graphprocessing becomes more and more important. Pregel, inspired by Bulk Synchronous Parallel, is one of the highly used systems to process large-scale graph problems. In Pregel, each vertex executes a function and waits for a superstep to communicate its data to other vertices. Superstep is a very time-consuming operation, used by Pregel, to synchronize distributed computations in a cluster of computers. However, it may become a bottleneck when the number of communications increases in a graph with million vertices. Superstep works like a barrier in Pregel that increases the side effect of skew problem in distributed computing environment. ExPregel is a Pregel-like model that is designed to reduce the number of communication messages between two vertices resided on two different computational nodes. We have proven that ExPregel reduces the number of exchanged messages as well as the number of supersteps for all graph topologies. Enhancing parallelism in our new computational model is another important feature that manifolds the speed of graph analysis programs. More interestingly, ExPregel uses the same model of programming as Pregel. Our experiments on large-scale real-world graphs show that ExPregel can reduce network traffic as well as number of supersteps from 45% to 96%. Runtime speed up in the proposed model varies from 1.2x to 30x. Copyright (c) 2015 John Wiley & Sons, Ltd.
The importance of graphs as the fundamental structure underpinning many real world applications is no longer to be proved. Large graphs have emerged in various fields such as biological, social and transportation netw...
详细信息
ISBN:
(纸本)9781479912926;9781479912933
The importance of graphs as the fundamental structure underpinning many real world applications is no longer to be proved. Large graphs have emerged in various fields such as biological, social and transportation networks. The sheer volume of these networks poses challenges to traditional techniques for storage and analysis of graph data. In particular, OLAP analysis requires access to large portions of data to extract key information and to feed strategic decision making. OLAP provides multilevel, multiperspective views of the data. Most of the current techniques are optimized for centralized graphprocessing. A distributed approach providing horizontal scalability is required in order to handle the analysis workload. In this paper, we focus on applying OLAP analysis on large, distributedgraph data. We describe distributedgraph Cube, our distributed framework for graph-based OLAP cubes computation and aggregation. Experimental results on large, real-world datasets demonstrate that our method significantly outperforms its centralized counterparts. We also evaluate the performance of both Hadoop and Spark for distributed cubes computations.
In this paper, we investigate the challenge of increasing the size of graphs for finding gamma-quasi-cliques. We propose an algorithm based on MapReduce programming model. In the proposed solution, we use some known t...
详细信息
ISBN:
(纸本)9783642273360;9783642273377
In this paper, we investigate the challenge of increasing the size of graphs for finding gamma-quasi-cliques. We propose an algorithm based on MapReduce programming model. In the proposed solution, we use some known techniques to prune unnecessary and inefficient parts of search space and divides the massive input graph into smaller parts. Then the data for processing each part is sent to a single computer. The evaluation shows that we can substantially reduce the time for large graphs and besides there is no limit for graph size in our algorithm.
暂无评论