检索结果-内蒙古大学图书馆

parallel Five-cycle Counting algorithms

ACM Journal of Experimental Algorithmics 2022年第3期27卷 1-23页

作者： Shi, Jessica Huang, Louisa Ruixue Shun, Julian Mit Csail 32 Vassar Street 32-G728 CambridgeMA02139 United States

Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This article focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count *** design two new parallel five-cycle counting algorithms and prove that they are work efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. Additionally, we present new parallel algorithms for obtaining unbiased estimates of five-cycle counts using graph sparsification. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our best exact parallel algorithm achieves 10-46× self-relative speedup, outperforms our serial benchmarks by 10-32×, and outperforms the previous state-of-the-art serial algorithm by up to 818×. Our best approximate algorithm, for a reasonable probability parameter, achieves up to 20× self-relative speedup and is able to approximate five-cycle counts 9-189× faster than our best exact algorithm, with between 0.52% and 11.77% error. © 2022 Copyright held by the owner/author(s).

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for large scale constrained tensor decomposition

Parallel algorithms for large scale constrained tensor decom...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： A. P. Liavas N. D. Sidiropoulos Dept. of ECE Tech. Univ. of Crete Chania Greece

ISBN: (纸本)9781467369985

Most tensor decomposition algorithms were developed for in-memory computation on a single machine. There are a few recent exceptions that were designed for parallel and distributed computation, but these cannot easily incorporate practically important constraints, such as nonnegativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration, yielding algorithms that are naturally amenable to parallel implementation. The methodology is exemplified using nonnegativity as a baseline constraint, but the proposed framework can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based nonnegative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches.

关键词： CANDECOMP PARAFAC Tensors constrained optimization nonnegative factorization parallel algorithms parallel algorithms Tensor Constrained optimization

来源：评论

学校读者我要写书评

暂无评论

An Evaluation of Shortcutting Strategies for parallel Bellman-Ford and Other parallel Single-Source Shortest Path algorithms

An Evaluation of Shortcutting Strategies for Parallel Bellma...

引用

作者： Li, Daniel Thomas University of California Riverside

学位级别：M.S., Master of Science/Master of Surgery

A fundamental question in graph theory is the Single-Source Shortest Path (SSSP) problem. This is well-studied in classical algorithm literature, but is only more recently studied in the parallel setting. A relatively simple way to solve SSSP in parallel is with a parallel Bellman-Ford(BF). BF shows strong performance on dense graphs, when m >> n. But due to its frontier-based approached, BF is bounded by the diameter of the graph. This thesis proposes 2 different preprocessing strategies to alleviate this. The first strategy is to generate shortcuts such that each vertex attempts to have at most degree k. The second approach is graph contraction, which removes specific vertices and replaces them with a single shortcut. We show that both preprocessing strategies reduce the overall rounds required to complete all testing algorithms. Additionally, we evaluate both preprocessing strategies with our own implementation of BF and state of the art parallel SSSP algorithms. In general, δ-stepping and ρ-stepping show improved times after contraction.

关键词： SSSP algorithms parallel algorithms parallel Bellman-Ford Graph contraction Preprocessing strategies

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for a Neurodynamic Optimization System Realized on GPU and Applied to Recovering Compressively Sensed Signals

Parallel Algorithms for a Neurodynamic Optimization System R...

引用

International Joint Conference on Neural Networks

作者： Xiaodan Zhu Chengan Guo School of Information and Communication Engineering Dalian University of Technology Dalian China

ISBN: (纸本)9781479919611

In this paper we develop a whole set of parallel algorithms for improving the computation efficiency of a neurodynamic optimization (NDO) system proposed in our previous work recently. The NDO method is able to solve the sparse signal recovery problems in compressive sensing with the globally convergent optimal solution approximating to the L_0 norm minimization, but has the shortcoming with heavy computation load that is an obstacle for its practical applications. The parallel algorithms are implemented on graphic processing units (GPU) programmed with CUDA language and applied to recovering compressively sensed sparse signals. Experiment results given in the paper show that the new parallel method can improve its computation efficiency significantly with the speedup ratio of more than 60 compared with the original serial NDO algorithm implemented on CPU, while keeping the solution precision unchanged.

关键词： parallel algorithm neurodynamic optimization recurrent neural networks compressive sensing GPU CUDA parallel algorithms Compressed sensing recurrent neural nets Graphics Processing Unit GRAPPER PICK UP NDO RECOVERING Signal restoration

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Burrows-Wheeler compression and decompression

引用

THEORETICAL COMPUTER SCIENCE 2014年 525卷 10-22页

作者： Edwards, James A. Vishkin, Uzi Univ Maryland College Pk MD 20742 USA

We present work-optimal PRAM algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log(2)n), and the depth of the corresponding decompression algorithm is O(logn). These appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. The algorithms for the individual stages of compression and decompression may also be of independent interest: (1) a novel O(logn)-time, O(n)-work PRAM algorithm for Huffman decoding;(2) original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent, allowing them to be mapped to elementary parallel routines that have O(logn)-time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression. Follow-up empirical work suggests potential for considerable practical speedups on a PRAM-driven many-core architecture, against a backdrop of negative contemporary results on common commercial platforms. (C) 2013 Elsevier B.V. All rights reserved.

关键词： parallel algorithms PRAM Burrows-Wheeler Lossless data compression Huffman coding Move-to-front coding

来源：评论

学校读者我要写书评

暂无评论

Deterministic algorithms for the Lov′asz Local Lemma: simpler, more general, and more parallel 33

Deterministic algorithms for the Lov′asz Local Lemma: simpl...

引用

Annual ACM-SIAM Symposium on Discrete algorithms (SODA)

作者： Harris, David G. Univ Maryland Dept Comp Sci College Pk MD 20742 USA

ISBN: (纸本)9781611977073

The Lov ' asz Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection B of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability p and affects at most d bad-events, and epd < 1, then a configuration avoiding all B exists. A seminal algorithm of Moser & Tardos (2010) (which we call the MT algorithm) gives nearly-automatic randomized algorithms for most constructions based on the LLL. However, deterministic algorithms have lagged behind. We address three specific shortcomings of the prior deterministic algorithms. First, our algorithm applies to the LLL criterion of Shearer (1985);this is more powerful than alternate LLL criteria and also removes a number of nuisance parameters and leads to cleaner and more legible bounds. Second, we provide parallel algorithms with much greater flexibility in the functional form of the bad-events. Third, we provide a derandomized version of the MT-distribution, that is, the distribution of the variables at the termination of the MT algorithm. We show applications to non-repetitive vertex coloring, independent transversals, strong coloring, and other problems. These give deterministic algorithms which essentially match the best previous randomized sequential and parallel algorithms.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Many Sequential Iterative algorithms Can Be parallel and (Nearly) Work-efficient 22

Many Sequential Iterative Algorithms Can Be Parallel and (Ne...

引用

34th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Shen, Zheqi Wan, Zijin Gu, Yan Sun, Yihan UC Riverside Riverside CA 92521 USA

ISBN: (纸本)9781450391467

Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve (O) over tilde (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover,

关键词： parallel algorithms phase-parallel framework parallel programming sequential iterative algorithms activity selection longest increasing subsequence maximal independent set independence system

来源：评论

学校读者我要写书评

暂无评论

parallel Optimization of Program Instructions Using Genetic algorithms

引用

Computers, Materials & Continua 2021年第6期67卷 3293-3310页

作者： Petre Anghelescu partment of Electronics Communications and ComputersUniversity of PitestiPitesti110040Romania

This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinstructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stagesof our proposed genetic algorithm are: The choice of the initial populationand its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, geneticalgorithms are applied to the entire search space of the parallelization ofthe program instructions problem. This problem is NP-complete, so thereare no polynomial algorithms that can scan the solution space and solve theproblem. The genetic algorithm-based method is general and it is simple andefficient to implement because it can be scaled to a larger or smaller number ofinstructions that must be parallelized. The parallelization technique proposedin this paper was developed in the C# programming language, and our resultsconfirm the effectiveness of our parallelization method. Experimental resultsobtained and presented for different working scenarios confirm the theoreticalresults, and they provide insight on how to improve the exploration of a searchspace that is too large to be searched exhaustively.

关键词： parallel instruction execution parallel algorithms genetic algorithms parallel genetic algorithms artificial intelligence techniques evolutionary strategies

来源：评论

学校读者我要写书评

暂无评论

parallel Minimum Spanning Tree algorithms via Lattice Linear Predicate Detection 36

Parallel Minimum Spanning Tree Algorithms via Lattice Linear...

引用

36th IEEE International parallel and Distributed Processing Symposium (IEEE IPDPS)

作者： Alves, David R. Garg, Vijay K. Univ Texas Austin Dept Elect & Comp Engn Austin TX 78712 USA

ISBN: (纸本)9781665497473

We show that the problem of computing the minimum spanning tree can be formulated as special case of detecting Lattice Linear Predicate (LLP). In general, formulating problems as LLP presents two main advantages: 1) Different problems are formulated under a single, general framework, which defines the problem in terms of simple local predicates that must hold for the all the elements of a lattice, making the problem (and the solution) compact and easy to understand. 2) improvements on one set of problems can be transferable to other sets of problems;3) since the problems are stated as a set of local predicates, which can be often tested with little or no synchronization it is often the case that new opportunities for parallelism present themselves. In this paper we introduce two parallel algorithms LLP-Prim and LLP-Boruvka that improve on the non-LLP counterparts in several ways. LLP-Prim reduces the number of heap operations required by Prim by allowing edges to be selected without entering the heap thus allowing for parallelism. LLP-Boruvka improves on Boruvka by reducing synchronization and thus once more improving parallelism opportunities. Our experimental evaluation shows that LLP-Prim is faster than Prim's algorithm in both single threaded and multithreaded scenarios and that it provides a good tradeoff between parallelism and efficiency at low core counts. For higher core count scenarios we show how LLP-Boruvka improves on an efficient implementation of a parallel version of Boruvka.

关键词： Minimum Spanning Tree parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel iterative stabilized finite element algorithms for the Navier-Stokes equations with nonlinear slip boundary conditions

引用

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS 2021年第4期93卷 1074-1109页

作者： Zhou, Kangrui Shang, Yueqiang Southwest Univ Sch Math & Stat Chongqing 400715 Peoples R China

Based on full domain partition technique, some parallel iterative pressure projection stabilized finite element algorithms for the Navier-Stokes equations with nonlinear slip boundary conditions are designed and analyzed. In these algorithms, the lowest equal-orderP(1) - P(1)elements are used for finite element discretization and a local pressure projection stabilized method is used to counteract the invalidness of the discrete inf-sup condition. Each subproblem is solved on a global composite mesh with the vast majority of the degrees of freedom associated with the particular subdomain that it is responsible for and hence can be solved in parallel with other subproblems by using an existing sequential solver without extensive recoding. All of the subproblems are nonlinear and are independently solved by three kinds of iterative methods. We estimate the optimal error bounds of the approximate solutions with the use of some (strong) uniqueness conditions. Numerical results are also given to demonstrate the effectiveness of the parallel algorithms.

关键词： full domain partition Navier-Stokes equations nonlinear slip boundary conditions parallel algorithms pressure projection

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：