Degeneracy is an important concept to measure the sparsity of a graph which has been widely used in many network analysis applications. Many network analysis algorithms, such as clique enumeration and truss decomposit...
详细信息
Degeneracy is an important concept to measure the sparsity of a graph which has been widely used in many network analysis applications. Many network analysis algorithms, such as clique enumeration and truss decomposition, perform very well in graphs having small degeneracies. in this paper, we propose an i/o-efficient algorithm to compute the degeneracy of the massive graph that cannot be fully kept in the main memory. The proposed algorithmonly uses o(n) memory, where n denotes the number of nodes of the graph. We also develop an i/o-efficient algorithm toincrementally maintain the degeneracy on dynamic graphs. Extensive experiments show that our algorithms significantly outperform the state-of-the-art degeneracy computation algorithms in terms of both running time and i/o costs. The results also demonstrate high scalability of the proposed algorithms. For example, in a real-world web graph with 930 million nodes and 13.3 billion edges, the proposed algorithm takes only 633 seconds and uses less than 4.5GB memory to compute the degeneracy.
Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is ...
详细信息
Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is more and more challenging to do butterfly counting due to the super-linear time complexity and large memory consumption. in this paper, we study i/o-efficient algorithms for doing butterfly counting on hierarchical memory. Existing algorithms of the kind cannot guarantee i/ooptimality. observing that in order to count butterflies, it suffices to "witness" a subgraph instead of the whole structure, a new class of algorithms called semi-witnessing algorithmis proposed. We prove that a semi-witnessing algorithmis not restricted by the lower bound Ømega(|E|2/MB) of a witnessing algorithm, and give a new bound of Ømega(min(|E|2/MB, |E|/|V| √M B)). We further develop the ioBufs algorithm that manages to approach the i/o lower bound, and thus claim its optimality. Finally, we make efforts to parallelize ioBufs to further improve the performance and scalability. We show in the experiment that ioBufs significantly outperforms the state-of-the-art algorithms EMRC and BFC-EM. in addition, ioBufs can scale to conducting butterfly counting on the Clueweb graph with 37 billion edges and quintillions (10^18 ) of butterflies.
in recent years, the size of graph data has increased significantly, but most existing graph clustering algorithms do not consider the case where the size of main memory is not sufficient to handle large amount of gra...
详细信息
in recent years, the size of graph data has increased significantly, but most existing graph clustering algorithms do not consider the case where the size of main memory is not sufficient to handle large amount of graph data. Exploring entire region of graph for clustering causes too many random disk accesses to use data that are not loaded into memory, resulting in excessive disk i/o and thrashing. To address this problem, we propose an i/o-efficient algorithm for structural clustering of a graph, called pm-SCAN. in the proposed method, if memory is insufficient, an input graph is partitioned into several subgraphs smaller than memory, and clustering is first performed for each subgraph. And then clusters from the subgraphs are merged based on connectivity between clusters so that global results can be obtained in the point of view of an original input graph. Not only does pm SCAN produce scalable performance even for very large graphs, i.e., significant shortage of available memory, but also the result of pm-SCAN is the same as that of the original structural clustering algorithm SCAN. We also propose a cluster maintenance method for large-scale dynamic graphs that change over time. instead of reclustering with a whole graph, only a small set of nodes whose structural connectivities are subject to change by a given update operation is first identified, and we access only those nodes in disk and update their clusters to reduce maintenance costs. This dynamic graph handling mechanism shows significant performance improvement compared to the existing method and the baseline that performs clustering from scratch.
Most existing algoritluns for graph clustering, including SCAN, are not designed to cope with large volumes of data that cannot fit in main memory. When there is not enough memory, those algorithms will incur thrashin...
详细信息
ISBN:
(纸本)9781450349185
Most existing algoritluns for graph clustering, including SCAN, are not designed to cope with large volumes of data that cannot fit in main memory. When there is not enough memory, those algorithms will incur thrashing, i.e. result in huge i/o costs. We propose an i/o-efficient algorithm for structural clustering, pm-SCAN. The main idea of our scheme is to partition a large graph into several subgraphs that can fit into main memory. We first find clusters in each subgraph, and then merge them to produce final clustering of the input graph. Experimental results show that while other existing algorithms are riot scalable to the graph size, our proposed method produces scalable performance for limited memory space.
Community search is a problem of finding densely connected subgraphs that satisfy the query conditions in a network, which has attracted much attention in recent years. However, all the previous studies on community s...
详细信息
Community search is a problem of finding densely connected subgraphs that satisfy the query conditions in a network, which has attracted much attention in recent years. However, all the previous studies on community search do not consider the influence of a community. in this paper, we introduce a novel community model called k-influential community based on the concept of k-core to capture the influence of a community. Based on this community model, we propose a linear time online search algorithm to find the top-r k-influential communities in a network. To further speed up the influential community search algorithm, we devise a linear space data structure which supports efficient search of the top-r k-influential communities in optimal time. We also propose an efficientalgorithm to maintain the data structure when the network is frequently updated. Additionally, we propose a novel i/o-efficient algorithm to find the top-r k-influential communities in a disk-resident graph under the assumption of , where and n denote the size of the main memory and the number of nodes, respectively. Finally, we conduct extensive experiments on six real-world massive networks, and the results demonstrate the efficiency and effectiveness of the proposed methods.
Triangle listing is a basic operator when dealing with many graph problems. However, in memory algorithms do not work well with recently developed massive graphs such as social networks because these graphs cannot be ...
详细信息
Triangle listing is a basic operator when dealing with many graph problems. However, in memory algorithms do not work well with recently developed massive graphs such as social networks because these graphs cannot be accommodated in the memory. Thus, external memory-based algorithms have been proposed recently, but these approaches still require frequent multiple scans of the whole graph on the disk and large volumes of calculations are performed that involve the whole graph during every iteration. in this study, we propose a novel index-based method for listing triangles in massive graphs. First, we present new notions for the vertex range index and potential cone vertex index. Next, we propose an index join-based triangle listing algorithm. our method accesses the indexed data asynchronously and joins them to list triangles using a multi-threaded parallel processing technique. Based on experiments, we demonstrate that our algorithmoutperforms the state-of-the-art solution methods by three to eight times in terms of the wall clock time. (C) 2015 Elsevier inc. All rights reserved.
This paper presents the first distributed triangle listing algorithm with provable CPU, i/o, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectiv...
详细信息
ISBN:
(纸本)9781467375887
This paper presents the first distributed triangle listing algorithm with provable CPU, i/o, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics, but the majority of existing algorithms for massive graphs are sequential, while distributed versions of algorithms do not guarantee their CPU, i/o, Memory, or Network requirements. our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting subgraphs into memory. it works by performing efficientorientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. PDTL suits a variety of computational environments, from single-core machines to high-end clusters, and computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems PowerGraph, oPT, and PATRiC by 2x to 4x. our approach thus highlights the importance of i/oin a distributed environment.
暂无评论