检索结果-内蒙古大学图书馆

arXiv 2020年

作者： Chang, Yi-Jun Studený, Jan Suomela, Jukka National University of Singapore Singapore Aalto University Finland

The locality of a graph problem is the smallest distance T such that each node can choose its own part of the solution based on its radius-T neighborhood. In many settings, a graph problem can be solved efficiently with a distributed or parallel algorithm if and only if it has a small locality. In this work we seek to automate the study of solvability and locality: given the description of a graph problem Π, we would like to determine if Π is solvable and what is the asymptotic locality of Π as a function of the size of the graph. Put otherwise, we seek to automatically synthesize efficient distributed and parallel algorithms for solving Π. We focus on locally checkable graph problems;these are problems in which a solution is globally feasible if it looks feasible in all constant-radius neighborhoods. Prior work on such problems has brought primarily bad news: questions related to locality are undecidable in general, and even if we focus on the case of labeled paths and cycles, determining locality is PSPACE-hard (Balliu et al., PODC 2019). We complement prior negative results with efficient algorithms for the cases of unlabeled paths and cycles and, as an extension, for rooted trees. We study locally checkable graph problems from an automata-theoretic perspective by representing a locally checkable problem Π as a nondeterministic finite automaton M over a unary alphabet. We identify polynomial-time-computable properties of the automaton M that near-completely capture the solvability and locality of Π in cycles and paths, with the exception of one specific case that is co-NP-complete. Copyright © 2020, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

MG-Hybrid: A Strongly Connected Components Detection Algorithm using Multiple GPUs

MG-Hybrid: A Strongly Connected Components Detection Algorit...

引用

IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Junteng Hou Shupeng Wang Guangjun Wu Bingnan Ma Chengxiang Si Siyu Jia Institute of Information Engineering Chinese Academy of Sciences Beijing China School of Cyber Security University of Chinese Academy of Sciences Beijing China National Computer Network Emergency Response Technical Team/Coordination Center of China Beijing China

ISBN: (数字)9781728133201

ISBN: (纸本)9781728133218

Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transformation between multiple GPUs. In this paper, we propose a novel distributed SCC detection approach using multiple GPUs plus CPU. Our approach includes three key ideas: (1) segmentation and labeling over large-scale datasets; (2) collecting and merging the segmented SCCs; and (3) running tasks assignment over multiples GPUs and CPU. We implement our approach under a hybrid distributed architecture with multiple GPUs plus CPU. Our approach can achieve device-level optimization and can be compatible with the state-of-the-art algorithms. We conduct extensive theoretical and experimental analysis to demonstrate efficiency and accuracy of our approach. The experimental results expose that our approach can achieves 11.2×, 1.2×, 1.2× speedup for SCC detection using NVIDIA K80 compared with Tarjan's, FB-Trim, and FB-Hybrid algorithms respectively.

关键词： Graphics processing units Detection algorithms Image edge detection Task analysis Partitioning algorithms parallel algorithms Electronic mail

来源：评论

学校读者我要写书评

暂无评论

Rethinking Virtual Link Mapping in Network Virtualization

Rethinking Virtual Link Mapping in Network Virtualization

引用

IEEE Conference on Vehicular Technology (VTC)

作者： Khoa TD Nguyen Qiao Lu Changcheng Huang Carleton University Ottawa Canada

ISBN: (数字)9781728194844

ISBN: (纸本)9781728194851

Virtual Network Embedding (VNE) that addresses the embedding problems of heterogeneous virtual networks onto a physical limited-capacity infrastructure efficiently is a major challenge in network virtualization (NV). VNE is computationally intractable when considering various constraints on nodes and links, and is also known as NP-hard even in offline embedding. Although the VNE problems have received attentions over recent decades with a vast number of VNE solutions, the majority of them only focus on VNE node mapping, whilst leaving the link mapping stage for the shortest path method or multicommodity flow (MCF) algorithm. We persuasively argue that node and link mappings equally play pivotal roles to approach an efficient VNE solution. In this paper, we reassess the role of link mapping stage in VNE problem, and then propose a novel intelligent VNE orchestration which effectively implements a distributed parallel model to reduce the operation time remarkably. Extensive evaluation results show that our proposed algorithm is not only faster than state-of-the-art VNE algorithms in speed, but also better in all performance metrics.

关键词： Vehicular and wireless technologies Scalability Network architecture Resource management Virtualization parallel algorithms Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

Overcoming MPI communication overhead for distributed community detection 2nd

Overcoming MPI communication overhead for distributed commun...

引用

2nd Workshop on Software Challenges to Exascale Computing, SCEC 2018

作者： Sattar, Naw Safrin Arifuzzaman, Shaikh Department of Computer Science University of New Orleans New OrleansLA70148 United States

ISBN: (纸本)9789811377280

Community detection is an important graph (network) analysis kernel used for discovering functional units and organization of a graph. Louvain method is an efficient algorithm for discovering communities. However, sequential Louvain method does not scale to the emerging large-scale network data. parallel algorithms designed for modern high performance computing platforms are necessary to process such network big data. Although there are several shared memory based parallel algorithms for Louvain method, those do not scale to a large number of cores and to large networks. One existing Message Passing Interface (MPI) based distributed memory parallel implementation of Louvain algorithm has shown scalability to only 16 processors. In this work, first, we design a shared memory based algorithm using Open MultiProcessing (OpenMP), which shows a 4-fold speedup but is only limited to the physical cores available to our system. Our second algorithm is an MPI-based distributed memory parallel algorithm that scales to a moderate number of processors. We then implement a hybrid algorithm combining the merits from both shared and distributed memory-based approaches. Finally, we incorporate a parallel load balancing scheme, which leads to our final algorithm DPLAL (Distributed parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms with improved load balancing. We present a comparative analysis of these parallel implementations of Louvain methods using several large real-world networks. DPLAL shows around 12-fold speedup and scales to a larger number of processors. © Springer Nature Singapore Pte Ltd. 2019.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Two Methods for Constructing Independent Spanning Trees in Alternating Group Networks

Two Methods for Constructing Independent Spanning Trees in A...

引用

IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)

作者： Jie-Fu Huang Sun-Yuan Hsieh National Cheng Kung University Tainan Taiwan

ISBN: (数字)9781728189154

ISBN: (纸本)9781728189161

In this paper, we propose a recursive and a parallel algorithms, respectively, for constructing independent spanning trees in alternating group networks. The recursive algorithm is BFS-based, while the parallel algorithm is BFS-based and rule-based. Both algorithms are accurate, and furthermore, the parallel algorithm is more efficient than the recursive one.

关键词： Conferences Software algorithms Software quality Search problems Software reliability Security parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast MSER

Fast MSER

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Hailiang Xu Siqi Xie Fan Chen Alibaba Group Nanjing University Beijing Language and Culture University Columbia University

ISBN: (数字)9781728171685

ISBN: (纸本)9781728171692

Maximally Stable Extremal Regions (MSER) algorithms are based on the component tree and are used to detect invariant regions. OpenCV MSER, the most popular MSER implementation, uses a linked list to associate pixels with ERs. The data-structure of an ER contains the attributes of a head and a tail linked node, which makes OpenCV MSER hard to be performed in parallel using existing parallel component tree strategies. Besides, pixel extraction (i.e. extracting the pixels in MSERs) in OpenCV MSER is very slow. In this paper, we propose two novel MSER algorithms, called Fast MSER V1 and V2. They first divide an image into several spatial partitions, then construct sub-trees and doubly linked lists (for V1) or a labelled image (for V2) on the partitions in parallel. A novel sub-tree merging algorithm is used in V1 to merge the sub-trees into the final tree, and the doubly linked lists are also merged in the process. While V2 merges the sub-trees using an existing merging algorithm. Finally, MSERs are recognized, the pixels in them are extracted through two novel pixel extraction methods taking advantage of the fact that a lot of pixels in parent and child MSERs are duplicated. Both V1 and V2 outperform three open source MSER algorithms (28 and 26 times faster than OpenCV MSER), and reduce the memory of the pixels in MSERs by 78%.

关键词： Erbium Partitioning algorithms Merging Feature extraction Indexes Memory management parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Scalable Algorithm for Subsequence Similarity Search in Very Large Time Series Data on Cluster of Phi KNL 20th

Scalable Algorithm for Subsequence Similarity Search in Very...

引用

20th International Conference Data Analytics and Management in Data-Intensive Domains, DAMDID/RCDL 2018

作者： Kraeva, Yana Zymbler, Mikhail South Ural State University Chelyabinsk Russia

ISBN: (纸本)9783030235833

Nowadays, subsequence similarity search under the Dynamic Time Warping (DTW) similarity measure is applied in a wide range of time series mining applications. Since the DTW measure has a quadratic computational complexity w.r.t. the length of query subsequence, a number of parallel algorithms for various many-core architectures have been developed, namely FPGA, GPU, and Intel MIC. In this paper, we propose a novel parallel algorithm for subsequence similarity search in very large time series data on computing cluster with nodes based on the Intel Xeon Phi Knights Landing (KNL) many-core processors. Computations are parallelized both at the level of all cluster nodes through MPI, and within a single cluster node through OpenMP. The algorithm involves additional data structures and redundant computations, which make it possible to effectively use Phi KNL for vector computations. Experimental evaluation of the algorithm on real-world and synthetic datasets shows that it is highly scalable. © 2019, Springer Nature Switzerland AG.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Symmetries: From Proofs To algorithms And Back

arXiv

引用

arXiv 2020年

作者： Aghamolaei, Sepideh Sharif University of Technology Tehran Iran

We call an objective function or algorithm symmetric with respect to an input if after swapping two parts of the input in any algorithm, the solution of the algorithm and the output remain the same. More formally, for a permutation π of an indexed input, and another permutation π′ of the same input, such that swapping two items converts π to π′, f(π) = f(π′), where f is the objective function. After reviewing samples of the algorithms that exploit symmetry, we give several new ones, for finding lower-bounds, beating adversaries in online algorithms, designing parallel algorithms and data summarization. We show how to use the symmetry between the sampled points to get a lower/upper bound on the solution. This mostly depends on the equivalence class of the parts of the input that when swapped, do not change the solution or its cost. Copyright © 2020, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimization of the Vector-Kronecker Product Multiplication

Performance Analysis and Optimization of the Vector-Kronecke...

引用

International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

作者： Alexandre Azevedo Cristiana Bentes Maria Clicia Castro Claude Tadonki Computational Sciences Program State University of Rio de Janeiro Department of Systems Engineering State University of Rio de Janeiro Department of Informatics and Computer Science State University of Rio de Janeiro CRI (Centre de Recherche en Informatique) MINES ParisTech

ISBN: (数字)9781728199245

ISBN: (纸本)9781728199252

The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, used to model complex systems using structured descriptions. This operation needs to be computed efficiently, since it is a critical kernel for iterative algorithms. In this work, we focus on the vector-kronecker product operation, where we present an in-depth performance analysis of a sequential and a parallel algorithm previously proposed. Based on this analysis, we proposed three optimizations: changing the memory access pattern, reducing load imbalance and manually vectorizing some portions of the code with Intel SSE4.2 intrinsics. The obtained results show better cache usage and load balance, thus improving the performance, especially for larger matrices.

关键词： Complexity theory Optimization Program processors Computational modeling Tensors parallel algorithms Scalability

来源：评论

学校读者我要写书评

暂无评论

GPU parallel implementation and optimisation of SAR target recognition method

引用

JOURNAL OF ENGINEERING-JOE 2019年第21期2019卷 8129-8133页

作者： Quan, H. Cui, Z. Wang, R. Cao, Zongjie Univ Elect Sci & Technol China Sch Informat & Commun Engn Chengdu Sichuan Peoples R China

The SAR target recognition based on optimised GPU parallel algorithm is proposed here. In general, with the rapid increment of the data dimension and the amount of data of SAR images, the traditional CPU-based target recognition algorithm cannot meet the requirements of real-time processing. Here, the target recognition algorithm which includes feature extraction and the classification is investigated and then parallel decomposed and optimised. First, the algorithms are investigated and parallel decomposed, including the principal component analysis, linear discriminant analysis, and non-negative matrix factorisation feature extraction technologies, and the support vector machines classifier. Then, the three feature extraction methods and sequential minimal optimisation algorithm are realised. Finally, the causes of compute unified device architecture programme running speed in target recognition algorithm are deeply analysed, and the algorithm is optimised from three aspects: communication, access, and instruction flow. According to the experiments, the optimised GPU-based parallel implementation of the target recognition algorithm has been optimised to obtain about 25-30 times performance upgrade

关键词： support vector machines graphics processing units principal component analysis parallel architectures synthetic aperture radar parallel algorithms pattern classification matrix decomposition optimisation radar imaging feature extraction SAR target recognition method optimised GPU parallel algorithm SAR images traditional CPU-based target recognition algorithm nonnegative matrix factorisation feature extraction technologies feature extraction methods sequential minimal optimisation algorithm optimised GPU-based parallel implementation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：