Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streamin...
详细信息
ISBN:
(纸本)9781728125831
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streamingalgorithm for counting global and local triangles has been widely studied, and most researches focus on the single-machine streamingalgorithm in a 'offline+batch processing' mode. However, researches on distributed online algorithm on multiple machines are still in its infancy, and this stage has not been thoroughly studied. In this paper, we investigate the triangle counting problem in large-scale simple undirected graphs whose edges arrive as a stream. We propose two distributed online streamingalgorithms to estimate the global number of triangles, which are based on the current best performance sampling-based streamingalgorithm. We mainly realize the reasonable partition of the graph stream, so that each worker independently estimates the number of triangles in a subgraph of the graph stream. Experimental results show that our algorithms reduce the estimation error and are several times more accurate than state-of-the-art streamingalgorithms.
Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in SAMOA (SCALABLE ADVANCE...
详细信息
ISBN:
(纸本)9781479956661
Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in SAMOA (SCALABLE ADVANCED MASSIVE ONLINE ANALYSIS), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 3 0 0 0 0 instances per second, and achieve a speedup of up to 4.7 x over the sequential version.
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streamin...
详细信息
ISBN:
(纸本)9781728125848
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, one-pass streamingalgorithm for counting global and local triangles has been widely studied, and most researches focus on the single-machine streamingalgorithm in a 'offline+batch processing' mode. However, researches on distributed online algorithm on multiple machines are still in its infancy, and this stage has not been thoroughly studied. In this paper, we investigate the triangle counting problem in large-scale simple undirected graphs whose edges arrive as a stream. We propose two distributed online streamingalgorithms to estimate the global number of triangles, which are based on the current best performance sampling-based streamingalgorithm. We mainly realize the reasonable partition of the graph stream, so that each worker independently estimates the number of triangles in a subgraph of the graph stream. Experimental results show that our algorithms reduce the estimation error and are several times more accurate than state-of-the-art streamingalgorithms.
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, the problem of co...
详细信息
Recently, the counting algorithm of local topology structures, such as triangles, has been widely used in social network analysis, recommendation systems, user portraits and other fields. At present, the problem of counting global and local triangles in a graph stream has beenwidely studied, and numerous triangle counting steaming algorithms have emerged. To improve the throughput and scalability of streamingalgorithms, many researches of distributed streaming algorithms on multiple machines are studied. In this article, we first propose a framework of distributed streaming algorithm based on the Master-Worker-Aggregator architecture. The two core parts of this framework are an edge distribution strategy, which plays a key role to affect the performance, including the communication overhead and workload balance, and aggregation method, which is critical to obtain the unbiased estimations of the global and local triangle counts in a graph stream. Then, we extend the state-of-the-art centralized algorithm TRIEST into four distributedalgorithms under our framework. Compared to their competitors, experimental results show that DVHT-i is excellent in accuracy and speed, performing better than the best existing distributed streaming algorithm. DEHT-b is the fastest algorithm and has the least communication overhead. What's more, it almost achieves absolute workload balance.
暂无评论