graphs are commonly used to model the relationships between various entities. These graphs can be enormously large and thus, scalablegraph analysis has been the subject of many research efforts. To enable scalable an...
详细信息
ISBN:
(纸本)9781728174457
graphs are commonly used to model the relationships between various entities. These graphs can be enormously large and thus, scalablegraph analysis has been the subject of many research efforts. To enable scalable analytics, many researchers have focused on generating realistic graphs that support controlled experiments for understanding how algorithms perform under changing graph features. Significant progress has been made on scalablegraph generation which preserve some important graph properties (e.g., degree distribution, clustering coefficients). In this paper, we study how to sample a graph from the space of graphs with a given shell distribution. Shell distribution is related to the k-core, which is the largest subgraph where each vertex is connected to at least k other vertices. A k-shell is the subset of vertices that are in k-core but not (k + 1)-core, and the shell distribution comprises the sizes of these shells. Core decompositions are widely used to extract information from graphs and to assist other computations. We present a scalable shared and distributed memory graph generator that, given a shell decomposition, generates a random graph that conforms to it. Our extensive experimental results show the efficiency and scalability of our methods. Our algorithm generates 2(33) vertices and 2(37) edges in less than 50 seconds on 384 cores.(1)
Network science methodology is increasingly applied to a large variety of real-world phenomena, often leading to big network data sets. Thus, networks (or graphs) with millions or billions of edges are more and more c...
详细信息
Network science methodology is increasingly applied to a large variety of real-world phenomena, often leading to big network data sets. Thus, networks (or graphs) with millions or billions of edges are more and more common. To process and analyze these data, we need appropriate graph processing systems and fast algorithms. Yet, many analysis algorithms were pioneered on small networks when speed was not the highest concern. Developing an analysis toolkit for large-scale networks thus often requires faster variants, both from an algorithmic and an implementation perspective. In this paper we focus on computational aspects of vertex centrality measures. Such measures indicate the (relative) importance of a vertex based on the position of the vertex in the network. We describe several common (and some recent and thus less established) measures, optimization problems in their context as well as algorithms for an efficient solution of the raised problems. Our focus is on (not necessarily exact) performance-oriented algorithmic techniques that enable significantly faster processing than the previous state of the art - often allowing to process massive data sets quickly and without resorting to distributed graph processing systems.
Efficiently processing large graphs is challenging, since parallel graphalgorithms suffer frompoor scalability and performance due to many factors, including heavy communication and load-imbalance.Furthermore, it is ...
详细信息
Efficiently processing large graphs is challenging, since parallel graphalgorithms suffer from
poor scalability and performance due to many factors, including heavy communication and load-imbalance.
Furthermore, it is difficult to express graphalgorithms, as users need to understand
and effectively utilize the underlying execution of the algorithm on the distributed system. The
performance of graphalgorithms depends not only on the characteristics of the system (such as
latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree,
mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense
communication). The best execution strategy, therefore, often heavily depends on the combination
of input graph, system and algorithm.
Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to
concentrate on a single vertex, making it easier to express parallel graphalgorithms. However,
this often loses information about the machine, making it difficult to extract performance and
scalability from fine-grained algorithms.
To address these issues, we present a model for expressing parallel graphalgorithms using a
fine-grained expression. Our model decouples the algorithm-writer from the underlying details
of the system, graph, and execution and tuning of the algorithm. We also present various graph
paradigms that optimize the execution of graphalgorithms for various types of input graphs and
systems. We show our model is general enough to allow graphalgorithms to use the various graph
paradigms for the best/fastest execution, and demonstrate good performance and scalability for
various different graphs, algorithms, and systems to 100,000+ cores.
暂无评论