Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging ...
详细信息
Most tensor decomposition algorithms were developed for in-memory computation on a single machine. There are a few recent exceptions that were designed for parallel and distributed computation, but these cannot easily...
详细信息
ISBN:
(纸本)9781467369985
Most tensor decomposition algorithms were developed for in-memory computation on a single machine. There are a few recent exceptions that were designed for parallel and distributed computation, but these cannot easily incorporate practically important constraints, such as nonnegativity. A new constrained tensor factorization framework is proposed in this paper, building upon the Alternating Direction method of Multipliers (ADMoM). It is shown that this simplifies computations, bypassing the need to solve constrained optimization problems in each iteration, yielding algorithms that are naturally amenable to parallel implementation. The methodology is exemplified using nonnegativity as a baseline constraint, but the proposed framework can incorporate many other types of constraints. Numerical experiments are encouraging, indicating that ADMoM-based nonnegative tensor factorization (NTF) has high potential as an alternative to state-of-the-art approaches.
A fundamental question in graph theory is the Single-Source Shortest Path (SSSP) problem. This is well-studied in classical algorithm literature, but is only more recently studied in the parallel setting. A relatively...
详细信息
A fundamental question in graph theory is the Single-Source Shortest Path (SSSP) problem. This is well-studied in classical algorithm literature, but is only more recently studied in the parallel setting. A relatively simple way to solve SSSP in parallel is with a parallel Bellman-Ford(BF). BF shows strong performance on dense graphs, when m >> n. But due to its frontier-based approached, BF is bounded by the diameter of the graph. This thesis proposes 2 different preprocessing strategies to alleviate this. The first strategy is to generate shortcuts such that each vertex attempts to have at most degree k. The second approach is graph contraction, which removes specific vertices and replaces them with a single shortcut. We show that both preprocessing strategies reduce the overall rounds required to complete all testing algorithms. Additionally, we evaluate both preprocessing strategies with our own implementation of BF and state of the art parallel SSSP algorithms. In general, δ-stepping and ρ-stepping show improved times after contraction.
In this paper we develop a whole set of parallel algorithms for improving the computation efficiency of a neurodynamic optimization (NDO) system proposed in our previous work recently. The NDO method is able to solve ...
详细信息
ISBN:
(纸本)9781479919611
In this paper we develop a whole set of parallel algorithms for improving the computation efficiency of a neurodynamic optimization (NDO) system proposed in our previous work recently. The NDO method is able to solve the sparse signal recovery problems in compressive sensing with the globally convergent optimal solution approximating to the L_0 norm minimization, but has the shortcoming with heavy computation load that is an obstacle for its practical applications. The parallel algorithms are implemented on graphic processing units (GPU) programmed with CUDA language and applied to recovering compressively sensed sparse signals. Experiment results given in the paper show that the new parallel method can improve its computation efficiency significantly with the speedup ratio of more than 60 compared with the original serial NDO algorithm implemented on CPU, while keeping the solution precision unchanged.
We present work-optimal PRAM algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log(2)n), and the de...
详细信息
We present work-optimal PRAM algorithms for Burrows-Wheeler compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log(2)n), and the depth of the corresponding decompression algorithm is O(logn). These appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. The algorithms for the individual stages of compression and decompression may also be of independent interest: (1) a novel O(logn)-time, O(n)-work PRAM algorithm for Huffman decoding;(2) original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent, allowing them to be mapped to elementary parallel routines that have O(logn)-time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression. Follow-up empirical work suggests potential for considerable practical speedups on a PRAM-driven many-core architecture, against a backdrop of negative contemporary results on common commercial platforms. (C) 2013 Elsevier B.V. All rights reserved.
The Lov ' asz Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection B of "bad" events which are mostly independent and hav...
详细信息
ISBN:
(纸本)9781611977073
The Lov ' asz Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection B of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability p and affects at most d bad-events, and epd < 1, then a configuration avoiding all B exists. A seminal algorithm of Moser & Tardos (2010) (which we call the MT algorithm) gives nearly-automatic randomized algorithms for most constructions based on the LLL. However, deterministic algorithms have lagged behind. We address three specific shortcomings of the prior deterministic algorithms. First, our algorithm applies to the LLL criterion of Shearer (1985);this is more powerful than alternate LLL criteria and also removes a number of nuisance parameters and leads to cleaner and more legible bounds. Second, we provide parallel algorithms with much greater flexibility in the functional form of the bad-events. Third, we provide a derandomized version of the MT-distribution, that is, the distribution of the variables at the termination of the MT algorithm. We show applications to non-repetitive vertex coloring, independent transversals, strong coloring, and other problems. These give deterministic algorithms which essentially match the best previous randomized sequential and parallel algorithms.
Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algor...
详细信息
ISBN:
(纸本)9781450391467
Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve (O) over tilde (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover,
This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinst...
详细信息
This paper describes an efficient solution to parallelize softwareprogram instructions, regardless of the programming language in which theyare written. We solve the problem of the optimal distribution of a set ofinstructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stagesof our proposed genetic algorithm are: The choice of the initial populationand its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, geneticalgorithms are applied to the entire search space of the parallelization ofthe program instructions problem. This problem is NP-complete, so thereare no polynomial algorithms that can scan the solution space and solve theproblem. The genetic algorithm-based method is general and it is simple andefficient to implement because it can be scaled to a larger or smaller number ofinstructions that must be parallelized. The parallelization technique proposedin this paper was developed in the C# programming language, and our resultsconfirm the effectiveness of our parallelization method. Experimental resultsobtained and presented for different working scenarios confirm the theoreticalresults, and they provide insight on how to improve the exploration of a searchspace that is too large to be searched exhaustively.
We show that the problem of computing the minimum spanning tree can be formulated as special case of detecting Lattice Linear Predicate (LLP). In general, formulating problems as LLP presents two main advantages: 1) D...
详细信息
ISBN:
(纸本)9781665497473
We show that the problem of computing the minimum spanning tree can be formulated as special case of detecting Lattice Linear Predicate (LLP). In general, formulating problems as LLP presents two main advantages: 1) Different problems are formulated under a single, general framework, which defines the problem in terms of simple local predicates that must hold for the all the elements of a lattice, making the problem (and the solution) compact and easy to understand. 2) improvements on one set of problems can be transferable to other sets of problems;3) since the problems are stated as a set of local predicates, which can be often tested with little or no synchronization it is often the case that new opportunities for parallelism present themselves. In this paper we introduce two parallel algorithms LLP-Prim and LLP-Boruvka that improve on the non-LLP counterparts in several ways. LLP-Prim reduces the number of heap operations required by Prim by allowing edges to be selected without entering the heap thus allowing for parallelism. LLP-Boruvka improves on Boruvka by reducing synchronization and thus once more improving parallelism opportunities. Our experimental evaluation shows that LLP-Prim is faster than Prim's algorithm in both single threaded and multithreaded scenarios and that it provides a good tradeoff between parallelism and efficiency at low core counts. For higher core count scenarios we show how LLP-Boruvka improves on an efficient implementation of a parallel version of Boruvka.
Based on full domain partition technique, some parallel iterative pressure projection stabilized finite element algorithms for the Navier-Stokes equations with nonlinear slip boundary conditions are designed and analy...
详细信息
Based on full domain partition technique, some parallel iterative pressure projection stabilized finite element algorithms for the Navier-Stokes equations with nonlinear slip boundary conditions are designed and analyzed. In these algorithms, the lowest equal-orderP(1) - P(1)elements are used for finite element discretization and a local pressure projection stabilized method is used to counteract the invalidness of the discrete inf-sup condition. Each subproblem is solved on a global composite mesh with the vast majority of the degrees of freedom associated with the particular subdomain that it is responsible for and hence can be solved in parallel with other subproblems by using an existing sequential solver without extensive recoding. All of the subproblems are nonlinear and are independently solved by three kinds of iterative methods. We estimate the optimal error bounds of the approximate solutions with the use of some (strong) uniqueness conditions. Numerical results are also given to demonstrate the effectiveness of the parallel algorithms.
暂无评论