With the development of the Semantic Web, more and more data is currently managed in the form of knowledge graphs. Different knowledge storage and query modes have their own advantages, but also have shortcomings, and...
详细信息
ISBN:
(纸本)9781665431828
With the development of the Semantic Web, more and more data is currently managed in the form of knowledge graphs. Different knowledge storage and query modes have their own advantages, but also have shortcomings, and there is no unified standard. Aiming at the current deficiencies in knowledge storage and knowledge query technology, this paper proposes a knowledge storage and query scheme based on TinkerPop graph computing framework, a general graph data processing framework that combines Neo4j massive graphdata storage capabilities and SPARQL semantic query capabilities.
graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast developmen...
详细信息
graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graphdata, graph data processing based on Pregel-like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graphdata iteration operation requires multiple synchronizations. Furthermore, the graphdata partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graphdata grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graphdata partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub-graph computational tasks. We also re-design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real-world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel-like system) by 58%-69%, and achieves 10x-17x performance improvement over Hadoop.
A very large quantity of data which traditional applications fail to process, leads the world to the era of Big data. With the increase in opportunity and technology scope, Big data also leads to many challenges such ...
详细信息
ISBN:
(纸本)9781538634523
A very large quantity of data which traditional applications fail to process, leads the world to the era of Big data. With the increase in opportunity and technology scope, Big data also leads to many challenges such as data capture, storage, transfer, update, analysis, sharing, search, visualization, privacy of data etc. In order to deal with all these challenges there is a need of proper framework which will not only process the data but also provide a meaningful analysis so as to take proper decision in critical situations either related to industry, healthcare, social network, science, telecom, environment, business etc. The contribution of this paper is to analyze literature related to Big data & Hadoop framework and provide architecture to process graphdata. Additionally, it provides online source code in order to understand big data for beginners.
How to effectively process massive graphdata is an intractable challenging issue. In this paper, two types of parallel computation approaches were compared: MapReduce and MyBSP. MyBSP is our open source implementatio...
详细信息
ISBN:
(纸本)9781479966219
How to effectively process massive graphdata is an intractable challenging issue. In this paper, two types of parallel computation approaches were compared: MapReduce and MyBSP. MyBSP is our open source implementation which adopts the Bulk Synchronous Parallel (BSP) programming model to support iterative processing. The MapReduce-based and MyBSP-based PageRank algorithms were implemented respectively. The experimental studies were conducted to evaluate and compare the performance and scalability of our MyBSP prototype system with MapReduce model. The results revealed that the MyBSP approach outperforms MapReduce approach for iterative graph data processing with vary size of datasets.
Internet of Things (IoT) devices are increasingly used in various applications in our daily lives. The network structure for IoT is heterogeneous and can create a complex architecture depending on the application and ...
详细信息
Internet of Things (IoT) devices are increasingly used in various applications in our daily lives. The network structure for IoT is heterogeneous and can create a complex architecture depending on the application and geographical structure. To efficiently process the information within this diverse and complex relationship, a robust data structure is needed for network operations. graph neural network (GNN) technology is emerging as a capable tool for predicting complex data structures, such as graphs. graphs can be employed to mimic the structure of IoT network and process information from IoT nodes using GNN techniques. In this paper, our goal is explore the effectiveness of GNN in performing the node classification task for a given network. We have generated three different IoT networks with varying network sizes, number nodes, and feature sizes. We then test 12 different GNN algorithms to evaluate their performance in IoT node classification. Each method is examined in detail to observe its training behavior, testing behavior, and resilience against noise. In addition, time complexity and generalization ability of each model have also been studied. The experimental results show that some methods exhibit high resilience against noisy data for IoT node classification accuracy.
Recent advancements in Natural Language processing (NLP) through pre-trained language models (PLMs) have significantly enhanced various computational tasks. However, their application to graph-structured data, particu...
详细信息
ISBN:
(纸本)9789819756711;9789819756728
Recent advancements in Natural Language processing (NLP) through pre-trained language models (PLMs) have significantly enhanced various computational tasks. However, their application to graph-structured data, particularly in capturing detailed structural information, remains challenging. Traditional approaches integrating node sub-graph information with transformer architectures have shown promise but suffer from computational inefficiencies and potential compromises in model adaptability due to extensive fine-tuning requirements. These requirements can limit knowledge transfer capabilities and the handling of natural language and graphdata simultaneously. This paper introduces the graph Transformer Adapter (GTA), a novel method that synergizes the strengths of PLMs with graph-structured data to refine graph node representations. GTA utilizes an innovative adapter mechanism that maintains the original PLM parameters unchanged, enhancing training efficiency and reducing computational demands while preserving the integrity of the original model. Extensive testing across various datasets has proven GTA's superior ability to manage graph-structured data effectively, showcasing its potential to leverage NLP advancements for improving graph node representations.
We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based on bulk synchronization processing(BSP) model to process graphdata. Although there have been some BSP-based graphprocessing ...
详细信息
ISBN:
(纸本)9781479912926;9781479912933
We introduce Kylin, an efficient and scalable graph data processing system. Kylin is based on bulk synchronization processing(BSP) model to process graphdata. Although there have been some BSP-based graphprocessing systems, Kylin is different from these systems in two-fold. First, Kylin cooperates with HBase to achieve scalable data manipulation. Second, We propose three techniques to optimize the performance of Kylin. The proposed techniques are pull messaging, lazy vertex loading and vertex-weighted partitioning. We demonstrate Kylin outperforms other BSP-based systems, i.e. Hama and Giraph, in the experiments.
graph-tensor learning operations extend tensor operations by taking the graph structure into account, which have been applied to diverse domains such as image processing and machine learning. However, the running time...
详细信息
graph-tensor learning operations extend tensor operations by taking the graph structure into account, which have been applied to diverse domains such as image processing and machine learning. However, the running time of graph-tensor operations increases rapidly with the number of nodes and the dimension of data on nodes, making them impractical for real-time applications. In this paper, we propose a GPU library called cugraph-Tensor for high-performance graph-tensor learning operations, which consists of eight key operations: graph shift (g-shift), graph Fourier transform (g-FT), inverse graph Fourier transform (inverse g-FT), graph filter (g-filter), graph convolution (g-convolution), graphtensor product (g-product), graph-tensor SVD (g-SVD) and graph-tensor QR (g-QR). cugraph-Tensor supports scalar, vector, and matrix dataprocessing on each graph node. We propose optimization techniques on computing, memory accesses, and CPU-GPU communications that significantly improve the performance of the graph-tensor learning operations. Using the optimized operations, cugraphTensor builds a graphdata completion application for fast and accurate reconstruction of incomplete graphdata. In the experiments, the proposed graph learning operations achieve up to 142.12x speedups versus CPU-based GSPBOX and CPU MATLAB implementations running on two Xeon CPUs. The graphdata completion application achieves up to 174.38x speedups over the CPU MATLAB implementation, and up to 3.82x speedups with better accuracy over the GPU-based tensor completion in the cuTensor-tubal library. (C) 2020 Elsevier Inc. All rights reserved.
The fast development of big data computing contributes to the fact that large-scale graphprocessing has become a basic computing model in both academic and industrial communities, and it has been applied in many actu...
详细信息
The fast development of big data computing contributes to the fact that large-scale graphprocessing has become a basic computing model in both academic and industrial communities, and it has been applied in many actual big data computing works, such as social network analysis, Web search, and product promotion. These computing works include large-scale graphs of billions of vertices and trillions of edges. Such scale has brought many challenges to large-scale graphprocessing. This paper mainly introduces the essential features and challenges of large-scale graphprocessing and how we can handle billions of edges on a multi-core machine, for which we represent out-of-core processing system and semi-external memory processing systems. This paper also summarizes the key technologies in graphprocessing systems and forecasts the future development of large-scale graphprocessing systems.
In recent years, systems researchers have devoted considerable effort to the study of large-scale graphprocessing. Existing distributed graphprocessing systems such as Pregel, based solely on distributed memory for ...
详细信息
ISBN:
(纸本)9781450337236
In recent years, systems researchers have devoted considerable effort to the study of large-scale graphprocessing. Existing distributed graphprocessing systems such as Pregel, based solely on distributed memory for their computations, fail to provide seamless scalability when the graphdata and their intermediate computational results no longer fit into the memory;and most distributed approaches for iterative graph computations do not consider utilizing secondary storage a viable solution. This paper presents graphMap, a distributed iterative graph computation framework that maximizes access locality and speeds up distributed iterative graph computations by effectively utilizing secondary storage. graphMap has three salient features: (1) It distinguishes data states that are mutable during iterative computations from those that are read-only in all iterations to maximize sequential access and minimize random access. (2) It entails a two-level graph partitioning algorithm that enables balanced workloads and locality-optimized data placement. (3) It contains a proposed suite of locality-based optimizations that improve computational efficiency. Extensive experiments on several real-world graphs show that graphMap outperforms existing distributed memory-based systems for various iterative graph algorithms.
暂无评论