Graph Convolutional Networks (GCN) have become a popular means of performing link prediction due to the high accuracy offered by them. However, scaling such link prediction into large graphs of billions of vertices an...
详细信息
ISBN:
(纸本)9781728187808
Graph Convolutional Networks (GCN) have become a popular means of performing link prediction due to the high accuracy offered by them. However, scaling such link prediction into large graphs of billions of vertices and edges with rich types of attributes is a significant issue to be addressed due to the storage and computation limitations of the machines. In this paper we present a scalable link prediction approach which conducts GCN training and link prediction on top of a distributed graph database server called JasmineGraph. We partition graph data and persist them in multiple workers. We implement parallel graph node embedding generation using GraphSAGE algorithm in multiple workers. Our approach avoids facing performance bottlenecks in GCN training using an intelligent scheduling algorithm. We show our approach scales well with an increasing number of partitions (2,4,8, and 16) using four real world data sets called Twitter, Amazon, Reddit, and DBLP-V11. JasmineGraph was able to train a GCN from the largest dataset DBLP-V11 (> 9.3GB) in 11 hours and 40 minutes time using 16 workers on a single server while the original GraphSAGE implementation could not process it at all. The original GraphSAGE implementation processed the second largest dataset Reddit in 238 minutes while JasmineGraph took only 100 minutes on the same hardware with 16 workers leading to 2.4 times improved performance.
Knowledge Graph (KG) is currently the most popular graph structure. For graphs, however, they are frequently incomplete due to the lack of most information in the real world. Knowledge Graph Completion (KGC) has becom...
详细信息
Knowledge Graph (KG) is currently the most popular graph structure. For graphs, however, they are frequently incomplete due to the lack of most information in the real world. Knowledge Graph Completion (KGC) has become the most interesting topic for researchers. At the same time, the relationships in social networks have expanded from binary relationships to more complex high-order relationships. However, the current research methods on multi-relationship still lack the interaction between entities, and the corresponding entity information under each relationship is different. Based on this, we propose a high-order GCN-based multi-relation prediction model (denoted as MHGCN). Firstly, an entity adjacent matrix is constructed for each relation, and a high-order graph convolutional network (GCN) is used to propagate the neighbor information among entities within the relation. Secondly, a probabilistic calculation method that integrates entity information is designed to judge the existence of facts. Experimental analysis is carried out on three representative datasets, which illustrate the effectiveness of our proposed algorithm. In particular, for FB-AUTO dataset with unfixed number of entities, the MRR of MHGCN reaches 0.883. The MRR of the fixed entity number dataset JF17K-4 is up to 0.828. It further shows that MHGCN is not only applicable to datasets with unfixed number of entities, but also applicable to datasets with fixed number of entities.
Unsupervised text summarization is a promising approach that avoids human efforts in generating reference summaries, which is particularly important for large-scale datasets. To improve its performance, we propose a h...
详细信息
Computer architectures that presume global hardware determinism are ultimately unscalable, but they are relatively easy to program because each operation is strictly sequenced and has an assured effect. Architectures ...
详细信息
Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplicatio...
详细信息
ISBN:
(纸本)9781728190747
Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.
It is a key issue to efficiently manage resources in the smart grid (SG) network that is a dynamic distributedgrid, in which the production, storage and users of electricity will work together under specific control....
详细信息
We propose SparsePipe, an efficient and asynchronous parallelism approach for handling 3D point clouds with multi-GPU training. SparsePipe is built to support 3D sparse data such as point clouds. It achieves this by a...
详细信息
ISBN:
(纸本)9781665422925
We propose SparsePipe, an efficient and asynchronous parallelism approach for handling 3D point clouds with multi-GPU training. SparsePipe is built to support 3D sparse data such as point clouds. It achieves this by adopting generalized convolutions with sparse tensor representation to build expressive high-dimensional convolutional neural networks. Compared to dense solutions, the new models can efficiently process irregular point clouds without densely sliding over the entire space, significantly reducing the memory requirements and allowing higher resolutions of the underlying 3D volumes for better performance. SparsePipe exploits intra-batch parallelism that partitions input data into multiple processors and further improves the training throughput with inter-batch pipelining to overlap communication and computing. Besides, it suitably partitions the model when the GPUs are heterogeneous such that the computing is load-balanced with reduced communication overhead. Using experimental results on an eight-GPU platform, we show that SparsePipe can parallelize effectively and obtain better performance on current point cloud benchmarks for both training and inference, compared to its dense solutions.
Medical applications are among the tasks of optical technology. The processing of two-dimensional optical signals and images is an urgent task today. One of the most dangerous eye diseases is diabetic macular retinopa...
详细信息
ISBN:
(数字)9781510644250
ISBN:
(纸本)9781510644250
Medical applications are among the tasks of optical technology. The processing of two-dimensional optical signals and images is an urgent task today. One of the most dangerous eye diseases is diabetic macular retinopathy. The first stage in the laser coagulation operation is the stage of fundus image segmentation. The calculation of texture features for solving this problem takes a lot of time. In this paper, we consider the use of a high-performance algorithm for calculating texture features based on distributedcomputing to speed up the processing and analysis of medical images. Various use cases of the high-performance algorithm on a single node were investigated and compared with sequential and parallel algorithms. The high-performance algorithm achieves a 40x speedup and more under some parameters. Using a high-performance algorithm, analysis and segmentation is performed in less than 1 minute for standard images. The use of a high-performance algorithm for the analysis and segmentation of fundus images avoids the need for a sequential skip-step algorithm, which, due to interpolation, reduces the execution time, but at the same time, accuracy is lost.
Edge detection is a critical component of many elements of image processing applications and computer visualization, such as image segmentation, detection of imperfections in industrial products, medical image process...
详细信息
Edge detection is a critical component of many elements of image processing applications and computer visualization, such as image segmentation, detection of imperfections in industrial products, medical image processing, and object identification. The edge detection technique seeks to accelerate image analysis by limiting the quantity of information processed. The object’s dimensions are a major component in manufacturing, engineering and other crucial fields, therefore the less time this takes, the more efficient the whole process becomes. This study details the use of unique image processing technology to advance the automated quantifying of an industrial object’s dimensions. Firstly, an image improvement approach founded on fuzzy entropy was devised to clarify the original deteriorated image. Secondly, the edges of the object were detected via the method of quick fuzzy edge detection. Finally, using Freeman Chain Codes (FCC), the essential corner locations were determined. Thus, the edges and corners of any given object image are accurately detected. This strategy is uncomplicated to implement, can efficiently identify size, shape, and any defects, and helps to determine acceptance or rejection of the final product. The results of the corresponding experiment indicate that the proposed method has a much lower computational cost in general compared to the related works.
Malware detection has attracted widespread attention due to the growing malware sophistication. Machine learning based methods have been proposed to find traces of malware by analyzing network traffic. However, networ...
详细信息
暂无评论