The locality of a graph problem is the smallest distance T such that each node can choose its own part of the solution based on its radius-T neighborhood. In many settings, a graph problem can be solved efficiently wi...
详细信息
Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transform...
详细信息
ISBN:
(数字)9781728133201
ISBN:
(纸本)9781728133218
Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transformation between multiple GPUs. In this paper, we propose a novel distributed SCC detection approach using multiple GPUs plus CPU. Our approach includes three key ideas: (1) segmentation and labeling over large-scale datasets; (2) collecting and merging the segmented SCCs; and (3) running tasks assignment over multiples GPUs and CPU. We implement our approach under a hybrid distributed architecture with multiple GPUs plus CPU. Our approach can achieve device-level optimization and can be compatible with the state-of-the-art algorithms. We conduct extensive theoretical and experimental analysis to demonstrate efficiency and accuracy of our approach. The experimental results expose that our approach can achieves 11.2×, 1.2×, 1.2× speedup for SCC detection using NVIDIA K80 compared with Tarjan's, FB-Trim, and FB-Hybrid algorithms respectively.
Virtual Network Embedding (VNE) that addresses the embedding problems of heterogeneous virtual networks onto a physical limited-capacity infrastructure efficiently is a major challenge in network virtualization (NV). ...
详细信息
ISBN:
(数字)9781728194844
ISBN:
(纸本)9781728194851
Virtual Network Embedding (VNE) that addresses the embedding problems of heterogeneous virtual networks onto a physical limited-capacity infrastructure efficiently is a major challenge in network virtualization (NV). VNE is computationally intractable when considering various constraints on nodes and links, and is also known as NP-hard even in offline embedding. Although the VNE problems have received attentions over recent decades with a vast number of VNE solutions, the majority of them only focus on VNE node mapping, whilst leaving the link mapping stage for the shortest path method or multicommodity flow (MCF) algorithm. We persuasively argue that node and link mappings equally play pivotal roles to approach an efficient VNE solution. In this paper, we reassess the role of link mapping stage in VNE problem, and then propose a novel intelligent VNE orchestration which effectively implements a distributed parallel model to reduce the operation time remarkably. Extensive evaluation results show that our proposed algorithm is not only faster than state-of-the-art VNE algorithms in speed, but also better in all performance metrics.
Community detection is an important graph (network) analysis kernel used for discovering functional units and organization of a graph. Louvain method is an efficient algorithm for discovering communities. However, seq...
详细信息
In this paper, we propose a recursive and a parallel algorithms, respectively, for constructing independent spanning trees in alternating group networks. The recursive algorithm is BFS-based, while the parallel algori...
详细信息
ISBN:
(数字)9781728189154
ISBN:
(纸本)9781728189161
In this paper, we propose a recursive and a parallel algorithms, respectively, for constructing independent spanning trees in alternating group networks. The recursive algorithm is BFS-based, while the parallel algorithm is BFS-based and rule-based. Both algorithms are accurate, and furthermore, the parallel algorithm is more efficient than the recursive one.
Maximally Stable Extremal Regions (MSER) algorithms are based on the component tree and are used to detect invariant regions. OpenCV MSER, the most popular MSER implementation, uses a linked list to associate pixels w...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Maximally Stable Extremal Regions (MSER) algorithms are based on the component tree and are used to detect invariant regions. OpenCV MSER, the most popular MSER implementation, uses a linked list to associate pixels with ERs. The data-structure of an ER contains the attributes of a head and a tail linked node, which makes OpenCV MSER hard to be performed in parallel using existing parallel component tree strategies. Besides, pixel extraction (i.e. extracting the pixels in MSERs) in OpenCV MSER is very slow. In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2. They first divide an image into several spatial partitions, then construct sub-trees and doubly linked lists (for V1) or a labelled image (for V2) on the partitions in parallel. A novel sub-tree merging algorithm is used in V1 to merge the sub-trees into the final tree, and the doubly linked lists are also merged in the process. While V2 merges the sub-trees using an existing merging algorithm. Finally, MSERs are recognized, the pixels in them are extracted through two novel pixel extraction methods taking advantage of the fact that a lot of pixels in parent and child MSERs are duplicated. Both V1 and V2 outperform three open source MSER algorithms (28 and 26 times faster than OpenCV MSER), and reduce the memory of the pixels in MSERs by 78%.
Nowadays, subsequence similarity search under the Dynamic Time Warping (DTW) similarity measure is applied in a wide range of time series mining applications. Since the DTW measure has a quadratic computational comple...
详细信息
We call an objective function or algorithm symmetric with respect to an input if after swapping two parts of the input in any algorithm, the solution of the algorithm and the output remain the same. More formally, for...
详细信息
The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, used to model complex systems using structured descriptions. This operation needs to be computed efficiently, since it is a...
详细信息
ISBN:
(数字)9781728199245
ISBN:
(纸本)9781728199252
The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, used to model complex systems using structured descriptions. This operation needs to be computed efficiently, since it is a critical kernel for iterative algorithms. In this work, we focus on the vector-kronecker product operation, where we present an in-depth performance analysis of a sequential and a parallel algorithm previously proposed. Based on this analysis, we proposed three optimizations: changing the memory access pattern, reducing load imbalance and manually vectorizing some portions of the code with Intel SSE4.2 intrinsics. The obtained results show better cache usage and load balance, thus improving the performance, especially for larger matrices.
The SAR target recognition based on optimised GPU parallel algorithm is proposed here. In general, with the rapid increment of the data dimension and the amount of data of SAR images, the traditional CPU-based target ...
详细信息
The SAR target recognition based on optimised GPU parallel algorithm is proposed here. In general, with the rapid increment of the data dimension and the amount of data of SAR images, the traditional CPU-based target recognition algorithm cannot meet the requirements of real-time processing. Here, the target recognition algorithm which includes feature extraction and the classification is investigated and then parallel decomposed and optimised. First, the algorithms are investigated and parallel decomposed, including the principal component analysis, linear discriminant analysis, and non-negative matrix factorisation feature extraction technologies, and the support vector machines classifier. Then, the three feature extraction methods and sequential minimal optimisation algorithm are realised. Finally, the causes of compute unified device architecture programme running speed in target recognition algorithm are deeply analysed, and the algorithm is optimised from three aspects: communication, access, and instruction flow. According to the experiments, the optimised GPU-based parallel implementation of the target recognition algorithm has been optimised to obtain about 25-30 times performance upgrade
暂无评论