The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that...
详细信息
ISBN:
(纸本)9781450311601
The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a unique dependence graph representing a trace of the computation annotated with every operation and value. This has been referred to as internal determinism, and implies a sequential semantics-i.e., considering any sequential traversal of the dependence graph is sufficient for analyzing the correctness of the code. In addition to returning deterministic results, internal determinism has many advantages including ease of reasoning about the code, ease of verifying correctness, ease of debugging, ease of defining invariants, ease of defining good coverage for testing, and ease of formally, informally and experimentally reasoning about performance. On the other hand one needs to consider the possible downsides of determinism, which might include making algorithms (i) more complicated, unnatural or special purpose and/or (ii) slower or less scalable. In this paper we study the effectiveness of this strong form of determinism through a broad set of benchmark problems. Our main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code. We leverage an approach to determinism suggested by Steele (1990), which is to use nested parallelism with commutative operations. Our algorithms apply several diverse programming paradigms that fit within the model including (i) a strict functional style (no shared state among concurrent operations), (ii) an approach we refer to as deterministic reservations, and (iii) the use of commutative, linearizable operations on data structures. We describe algorithms for the benchmark problems that use these deterministic approaches and present performan
Data Mining applications have to deaf with increasingly large data sets and complexity. Only algorithms which scale linearly with data size are feasible. We present parallel regression algorithms which after a few ini...
详细信息
ISBN:
(纸本)185312821X
Data Mining applications have to deaf with increasingly large data sets and complexity. Only algorithms which scale linearly with data size are feasible. We present parallel regression algorithms which after a few initial scans of the data compute predictive models for data mining and do not require further access to the data. In addition, we describe various ways of dealing with the complexity (high dimensionality) of the data. Three methods are presented for three different ranges of attribute numbers. They use ideas from the finite element method and are based on penalised least squares fits using sparse grids and additive models for intermediate and very high dimensional data. Computational experiments confirm scalability both with respect to data size and number of processors.
The minimum cut and minimum length linear arrangement problems usually occur in solving wiring problems and have a lot in common with job sequencing questions. Both problems are NP-complete for general graphs and in P...
详细信息
ISBN:
(纸本)354060216X
The minimum cut and minimum length linear arrangement problems usually occur in solving wiring problems and have a lot in common with job sequencing questions. Both problems are NP-complete for general graphs and in P for trees. We present here two parallel algorithms for the CREW PRAM. The first solves the minimum length linear arrangement problem for trees and the second solves the minimum cut arrangement for trees. We prove that the first problem belongs to NC for trees, and the second problem also is in NC for bounded degree trees.
The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently se...
详细信息
ISBN:
(纸本)9781509028238
The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently sequential. In this paper, we design and implement parallel approximation algorithms for k-center. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds;in practice, we find that two rounds are sufficient, leading to a 4-approximation. In practice, we find this parallel scheme is about 100 times faster than the sequential Gonzalez algorithm, and barely compromises solution quality. We contrast this with an existing parallel algorithm for k-center that offers a 10-approximation. Our analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to input size. In practice, It is slightly more effective than Gonzalez's approach, but is slow. To trade off runtime for approximation guarantee, we parameterize this sampling algorithm. We prove a lower bound on the parameter for effectiveness, and find experimentally that with values even lower than the bound, the algorithm is not only faster, but sometimes more effective.
Writing irregular parallel algorithms with OpenMP has been rarely practised. in the past. Yet it is possible, and in this paper we will use a simple breadth-first search application as an example to show a possible st...
详细信息
ISBN:
(纸本)3540377832
Writing irregular parallel algorithms with OpenMP has been rarely practised. in the past. Yet it is possible, and in this paper we will use a simple breadth-first search application as an example to show a possible stepping stone and deficiency of the OpenMP specification: It is very difficult to cancel the threads in a parallel region. A way to work around the issue within the existing OpenMP specification is sketched, while our main contribution is a proposal for an extension of OpenMP to solve the issue in an easier way.
In this paper, we propose four parallel algorithms (NPA, SPA, HPA and HPA-ELD) for mining association rules on shared-nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied am...
详细信息
ISBN:
(纸本)081867475X
In this paper, we propose four parallel algorithms (NPA, SPA, HPA and HPA-ELD) for mining association rules on shared-nothing parallel machines to improve its performance. In NPA, candidate itemsets are just copied amongst all the processors, which can lead to memory overflow for large transaction databases. The remaining three algorithms partition the candidate itemsets over the processors. If it is partitioned simply (SPA), transaction data has to be broadcast to all processors. HPA partitions the candidate itemsets using a hash function to eliminate broadcasting, which also reduces the comparison workload significantly. HPA-ELD fully utilizes the available memory space by detecting the extremely large itemsets and copying them, which is also very effective at flattering the load over the processors. We implemented these algorithms in a shared-nothing environment. Performance evaluations shout that the best algorithm, HPA-ELD, attains good linearity on speedup ratio and is effective for handling skew.
The Lovasz Local Lemma (LLL) is a probabilistic tool which shows that, if a collection of \bad" events B in a probability space are not too likely and not too interdependent, then there is a positive probability ...
详细信息
ISBN:
(纸本)9781611975482
The Lovasz Local Lemma (LLL) is a probabilistic tool which shows that, if a collection of \bad" events B in a probability space are not too likely and not too interdependent, then there is a positive probability that no bad-events in B occur. Moser & Tardos (2010) gave sequential and parallel algorithms which transformed most applications of the variable-assignment LLL into efficient algorithms. A framework of Harvey & Vondrak (2015) based on \resampling oracles" extended this give very general sequential algorithms for other probability spaces satisfying the Lopsided Lov asz Local Lemma (LLLL). We describe a new structural property of resampling oracles which holds for all known resampling oracles, which we call \obliviousness." Essentially, it means that the interaction between two bad-events B;B0 depends only on the randomness used to resample B, and not on the precise state within B itself. This property has two major consequences. First, it is the key to achieving a unified parallel LLLL algorithm, which is faster than previous, problemspecific algorithms of Harris (2016) for the variableassignment LLLL algorithm and of Harris & Srinivasan (2014) for permutations. This new algorithm extends a framework of Kolmogorov (2016), and gives the first RNC algorithms for rainbow perfect matchings and rainbow hamiltonian cycles of K-n. Second, this property allows us to build LLLL probability spaces out of a relatively simple \atomic" set of events. It was intuitively clear that existing LLLL spaces were built in this way;but the obliviousness property formalizes this and gives a way of automatically turning a resampling oracle for atomic events into a resampling oracle for conjunctions of them. Using this framework, we get the first sequential resampling oracle for rainbow perfect matchings on the complete s-uniform hypergraph K-n((s)) and the first commutative resampling oracle for hamiltonian cycles of K-n
We adapt the classic sequential model used to predict the performance of communications on parallel computers to a Local Area Network using PVM. We have discovered that the linear model using the values of the paramet...
详细信息
ISBN:
(纸本)0818683325
We adapt the classic sequential model used to predict the performance of communications on parallel computers to a Local Area Network using PVM. We have discovered that the linear model using the values of the parameters (latency, transfer rate) given by the "ping-pong" algorithm does not predict correctly the times invested in several communication patterns. This work concentrates in one of these patterns. One to All communications. We propose an alternative experiment that takes into account the overlapping between computations and communications that arises for this pattern. We illustrate the problem and its solution using a parallel matrix multiplication algorithm based on the master-slave paradigm. The detection of this phenomenon and the determination of the new values for the parameters simplify the design of parallel algorithms and can be used to predict the appropriate number of processors to achieve maximum efficiency.
This paper considers the computations associated with stability analysis for linear repetitive systems and, in particular, the so-called differential non-unit memory sub-class which are of direct industrial relevance....
详细信息
This paper considers the computations associated with stability analysis for linear repetitive systems and, in particular, the so-called differential non-unit memory sub-class which are of direct industrial relevance. The results presented consist of basic algorithms for hypercube based parallel computation of a frequency response matrix which is, in effect, the basis of one set of stability tests for this case. Further, some possible areas for further development of these highly promising initial results are briefly discussed.
In this paper we analyse the parallelisation of a stable algorithm for computing the controllable or observable part of a dynamic linear system. Our parallel algorithm is based on a numerically stable method that tran...
详细信息
In this paper we analyse the parallelisation of a stable algorithm for computing the controllable or observable part of a dynamic linear system. Our parallel algorithm is based on a numerically stable method that transforms the matrices of the system to block Hessenberg (or staircase) form. This algorithm requires one-sided orthogonal transformations and usual data layouts as 1-D and 2-D block column/row wrap are appropriate. The experimental results of the basic computational kernels on multicomputers show the degree of parallelism of this approach.
暂无评论