graphX is a graph computing library based on Spark systems, where fault tolerance is a necessary guarantee for the high availability. However, the existing fault tolerance methods are mostly implemented in a pessimist...
详细信息
graphX is a graph computing library based on Spark systems, where fault tolerance is a necessary guarantee for the high availability. However, the existing fault tolerance methods are mostly implemented in a pessimistic way and are aimed at general computing tasks. Considering the characteristics of iterative computation, this paper presents a combination method of the optimistic fault tolerance and checkpoint for recovering the data under different failure conditions. Firstly, for single node failure, we propose the optimistic fault tolerance mechanism based on compensation function. It does not add fault tolerance measures in advance and will not incur additional costs when there are no failures. Secondly, for multiple node failures, we propose the automatic checkpoint management strategy based on RDD importance. It comprehensively considers the factors of lineage length of RDD, dependency relationship, and computation time of RDD, which can set the RDD as the checkpoint properly. Finally, we implement our proposals in graphX of Spark-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}3.5.1, and evaluate the performance by using representative iterative graph algorithms on the high performance computing cluster. The results verify the correctness of iteration results of the mechanism, and illustrate that when recovering the RDD partition, the job execution time can be reduced by the mechanism and strategy substantially.
In concurrent graph processing, different queries are evaluated on the same graph simultaneously, sharing the graph accesses via the memory hierarchy. However, different queries may traverse the graph differently, esp...
详细信息
ISBN:
(纸本)9781450399159
In concurrent graph processing, different queries are evaluated on the same graph simultaneously, sharing the graph accesses via the memory hierarchy. However, different queries may traverse the graph differently, especially for those starting from different source vertices. When these graph traversals are lmisalignedz, the benefits of graph access sharing can be seriously compromised. As more concurrent queries are added to the evaluation batch, the issue tends to become even worse. To address the above issue, thiswork introduces Glign, a runtime system that automatically aligns the graph traversals for concurrent queries. Glign introduces three levels of graph traversal alignment for iterative evaluation of concurrent queries. First, it synchronizes the accesses of different queries to the active parts of the graph within each iteration of the evaluationDintra-iteration alignment. On top of that, Glign leverages a key insight regarding the lheavy iterationsz in query evaluation to achieve inter-iteration alignment and alignment-aware batching. The former aligns the iterations of different queries to increase the graph access sharing, while the latter tries to group queries of better graph access sharing into the same evaluation batch. Together, these alignment techniques can substantially boost the data locality of concurrent query evaluation. Based on our experiments, Glign outperforms the state-of-the-art concurrent graph processing systems Krill and graphM by 3.6x and 4.7x on average, respectively.
graph analytics has been widely used for analyzing large-scale networks using ever-growing graphs to represent the relationships and entities. The real-world graphs are often large and constantly evolving over time. F...
详细信息
graph analytics has been widely used for analyzing large-scale networks using ever-growing graphs to represent the relationships and entities. The real-world graphs are often large and constantly evolving over time. Former requires parallel distributed processing across multiple multicore machines; latter requires repetitive processing over different snapshots of the graphs over time. The combined memories of multiple machines are able to hold large graphs and the large number of cores made available by multiple machines enhance the degree of parallelism delivering scalability. A significant drawback of distributed platforms is that they impose long latency operations due to their high communication latency between machines in the cluster; especially when the computation load of a graph query is small (e.g., finding the shortest path in a graph).This research first introduces the MultiLyra distributed batching system that amortizes the communication and computation costs across multiple queries by simultaneously evaluating batches of hundreds of iterativegraph queries improving the throughput while at the same time maintaining the scalability of distributed processing. Via use of a unified frontier for all queries in a batch, the overhead of distributed evaluation is amortized across queries. MultiLyra yields maximum speedups over the single query baseline ranging from 3.08x to 5.55x across different batch sizes, algorithms and input graphs which are then improved to speedups range from 7.35 to 11.86 by employing a fine-grained query tracking technique and value reuse ***, it introduces the ExpressWay technique for faster convergence based on differential treatment of important edges in the graph to further improve the efficiency of the batching system. Each machine in the cluster loads a small portion of the edges from the graph, i.e., the important edges contributing the most in delivering the final results to the vertices, and run the graph queri
Due to the rapid changes in graph data sets, the mined information will quickly become obsolete, thus the entire data set needs to be re-computed from the beginning, which will result in the waste of computing time an...
详细信息
ISBN:
(纸本)9781728143286
Due to the rapid changes in graph data sets, the mined information will quickly become obsolete, thus the entire data set needs to be re-computed from the beginning, which will result in the waste of computing time and resources. To reduce the cost of such computations, this paper proposes a model called i(2)graph to support incremental iterative computation for dynamic graphs. Different from the way of traditional iteration, i(2)graph executes the graphalgorithm by reusing the results of the previous graph and performs computation on parts of the graph that has changed. i(2)graph contains two components: (1) an incremental iterative computation model to improve the execution efficiency of the iterative graph algorithm;and (2) an incremental update method to accelerate the iterative process within the iterative graph algorithm. It is implemented based on Spark graphX, a popular parallel and distributed computing framework for large-scale graph processing. Experiment results verify the performance advantages of i(2)graph model when performing some iterative graph algorithms on the dynamic graph, compared with the traditional iteration.
Most graphalgorithms are iterative in nature. They can be processed by distributed systems in memory in an efficient asynchronous manner. However, it is challenging to recover from failures in such systems. This is b...
详细信息
Most graphalgorithms are iterative in nature. They can be processed by distributed systems in memory in an efficient asynchronous manner. However, it is challenging to recover from failures in such systems. This is because traditional checkpoint fault-tolerant frameworks incur expensive barrier costs that usually offset the gains brought by asynchronous computations. Worse, surviving data are rolled back, leading to costly re-computations. This paper first proposes to leverage surviving data for failure recovery in an asynchronous system. Our framework guarantees the correctness of algorithms and avoids rolling back surviving data. Additionally, a novel asynchronous checkpointing solution is introduced to accelerate recovery at the price of nearly zero overheads. Some optimization strategies like message pruning, non-blocking recovery and load balancing are also designed to further boost the performance. We have conducted extensive experiments to show the effectiveness of our proposals using real-world graphs.
暂无评论