How to effectively process massive graphdata is an intractable challenging issue. In this paper, two types of parallel computation approaches were compared: MapReduce and MyBSP. MyBSP is our open source implementatio...
详细信息
ISBN:
(纸本)9781479966219
How to effectively process massive graphdata is an intractable challenging issue. In this paper, two types of parallel computation approaches were compared: MapReduce and MyBSP. MyBSP is our open source implementation which adopts the Bulk Synchronous Parallel (BSP) programming model to support iterative processing. The MapReduce-based and MyBSP-based PageRank algorithms were implemented respectively. The experimental studies were conducted to evaluate and compare the performance and scalability of our MyBSP prototype system with MapReduce model. The results revealed that the MyBSP approach outperforms MapReduce approach for iterative graph data processing with vary size of datasets.
A very large quantity of data which traditional applications fail to process, leads the world to the era of Big data. With the increase in opportunity and technology scope, Big data also leads to many challenges such ...
详细信息
ISBN:
(纸本)9781538634523
A very large quantity of data which traditional applications fail to process, leads the world to the era of Big data. With the increase in opportunity and technology scope, Big data also leads to many challenges such as data capture, storage, transfer, update, analysis, sharing, search, visualization, privacy of data etc. In order to deal with all these challenges there is a need of proper framework which will not only process the data but also provide a meaningful analysis so as to take proper decision in critical situations either related to industry, healthcare, social network, science, telecom, environment, business etc. The contribution of this paper is to analyze literature related to Big data & Hadoop framework and provide architecture to process graphdata. Additionally, it provides online source code in order to understand big data for beginners.
With the development of the Semantic Web, more and more data is currently managed in the form of knowledge graphs. Different knowledge storage and query modes have their own advantages, but also have shortcomings, and...
详细信息
ISBN:
(纸本)9781665431828
With the development of the Semantic Web, more and more data is currently managed in the form of knowledge graphs. Different knowledge storage and query modes have their own advantages, but also have shortcomings, and there is no unified standard. Aiming at the current deficiencies in knowledge storage and knowledge query technology, this paper proposes a knowledge storage and query scheme based on TinkerPop graph computing framework, a general graph data processing framework that combines Neo4j massive graphdata storage capabilities and SPARQL semantic query capabilities.
We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based on bulk synchronization processing(BSP) model to process graphdata. Although there have been some BSP-based graphprocessing ...
详细信息
ISBN:
(纸本)9781479912926;9781479912933
We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based on bulk synchronization processing(BSP) model to process graphdata. Although there have been some BSP-based graphprocessing systems, Kylin is different from these systems in two-fold. First, Kylin cooperates with HBase to achieve scalable data manipulation. Second, We propose three techniques to optimize the performance of Kylin. The proposed techniques are pull messaging, lazy vertex loading and vertex-weighted partitioning. We demonstrate Kylin outperforms other BSP-based systems, i.e. Hama and Giraph, in the experiments.
graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast developmen...
详细信息
graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graphdata, graph data processing based on Pregel-like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graphdata iteration operation requires multiple synchronizations. Furthermore, the graphdata partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graphdata grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graphdata partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub-graph computational tasks. We also re-design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real-world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel-like system) by 58%-69%, and achieves 10x-17x performance improvement over Hadoop.
graphdata is the default data organization mechanism used in large-scale Social Network Service (SNS) applications. Traditional graphdata computing models are used to dig out useful hidden information inside the dat...
详细信息
ISBN:
(纸本)9780769550886
graphdata is the default data organization mechanism used in large-scale Social Network Service (SNS) applications. Traditional graphdata computing models are used to dig out useful hidden information inside the data. However, the ever growing data volume is adding more and more pressures. To retrieve and discover the information, the system has to introduce a larger number of data iterations. This makes the data analysis operations becoming slower. To speed up these operations on large-scale graphdata, recent research works focus on developing efficient parallel iteration processing strategies. However, the synchronization requirements between successive iterations can severely jeopardize the effectiveness of parallel operations. In this paper, we propose a novel large-scale graph data processing model, Arbor, to address these issues. Arbor substitutes time-constrained synchronization operations with non-time-constrained control message transmissions to increase the degree of parallelism. Furthermore, it develops a new graphdata organization format, which can not only save storage space, but also accelerate graph data processing operations. We compare Arbor with other graphprocessing models using a large-scale experimental graphdata, and the results show that it outperforms the state-of-the-art systems.
graph-tensor learning operations extend tensor operations by taking the graph structure into account, which have been applied to diverse domains such as image processing and machine learning. However, the running time...
详细信息
graph-tensor learning operations extend tensor operations by taking the graph structure into account, which have been applied to diverse domains such as image processing and machine learning. However, the running time of graph-tensor operations increases rapidly with the number of nodes and the dimension of data on nodes, making them impractical for real-time applications. In this paper, we propose a GPU library called cugraph-Tensor for high-performance graph-tensor learning operations, which consists of eight key operations: graph shift (g-shift), graph Fourier transform (g-FT), inverse graph Fourier transform (inverse g-FT), graph filter (g-filter), graph convolution (g-convolution), graphtensor product (g-product), graph-tensor SVD (g-SVD) and graph-tensor QR (g-QR). cugraph-Tensor supports scalar, vector, and matrix dataprocessing on each graph node. We propose optimization techniques on computing, memory accesses, and CPU-GPU communications that significantly improve the performance of the graph-tensor learning operations. Using the optimized operations, cugraphTensor builds a graphdata completion application for fast and accurate reconstruction of incomplete graphdata. In the experiments, the proposed graph learning operations achieve up to 142.12x speedups versus CPU-based GSPBOX and CPU MATLAB implementations running on two Xeon CPUs. The graphdata completion application achieves up to 174.38x speedups over the CPU MATLAB implementation, and up to 3.82x speedups with better accuracy over the GPU-based tensor completion in the cuTensor-tubal library. (C) 2020 Elsevier Inc. All rights reserved.
The fast development of big data computing contributes to the fact that large-scale graphprocessing has become a basic computing model in both academic and industrial communities, and it has been applied in many actu...
详细信息
The fast development of big data computing contributes to the fact that large-scale graphprocessing has become a basic computing model in both academic and industrial communities, and it has been applied in many actual big data computing works, such as social network analysis, Web search, and product promotion. These computing works include large-scale graphs of billions of vertices and trillions of edges. Such scale has brought many challenges to large-scale graphprocessing. This paper mainly introduces the essential features and challenges of large-scale graphprocessing and how we can handle billions of edges on a multi-core machine, for which we represent out-of-core processing system and semi-external memory processing systems. This paper also summarizes the key technologies in graphprocessing systems and forecasts the future development of large-scale graphprocessing systems.
Internet of Things (IoT) devices are increasingly used in various applications in our daily lives. The network structure for IoT is heterogeneous and can create a complex architecture depending on the application and ...
详细信息
Internet of Things (IoT) devices are increasingly used in various applications in our daily lives. The network structure for IoT is heterogeneous and can create a complex architecture depending on the application and geographical structure. To efficiently process the information within this diverse and complex relationship, a robust data structure is needed for network operations. graph neural network (GNN) technology is emerging as a capable tool for predicting complex data structures, such as graphs. graphs can be employed to mimic the structure of IoT network and process information from IoT nodes using GNN techniques. In this paper, our goal is explore the effectiveness of GNN in performing the node classification task for a given network. We have generated three different IoT networks with varying network sizes, number nodes, and feature sizes. We then test 12 different GNN algorithms to evaluate their performance in IoT node classification. Each method is examined in detail to observe its training behavior, testing behavior, and resilience against noise. In addition, time complexity and generalization ability of each model have also been studied. The experimental results show that some methods exhibit high resilience against noisy data for IoT node classification accuracy.
The introduction of Google's Pregel generated much interest in the field of large-scale graph data processing, inspiring the development of Pregel-like systems such as Apache Giraph, GPS, Mizan, and graphLab, all ...
详细信息
The introduction of Google's Pregel generated much interest in the field of large-scale graph data processing, inspiring the development of Pregel-like systems such as Apache Giraph, GPS, Mizan, and graphLab, all of which have appeared in the past two years. To gain an understanding of how Pregel-like systems perform, we conduct a study to experimentally compare Giraph, GPS, Mizan, and graphLab on equal ground by considering graph and algorithm agnostic optimizations and by using several metrics. The systems are compared with four different algorithms (PageRank, single source shortest path, weakly connected components, and distributed minimum spanning tree) on up to 128 Amazon EC2 machines. We find that the system optimizations present in Giraph and graphLab allow them to perform well. Our evaluation also shows Giraph 1.0.0's considerable improvement since Giraph 0.1 and identifies areas of improvement for all systems.
暂无评论