We show that the several problems, whose complexity with respect P and NC was open, are equivalent under NC-reductions. These include: (1) Finding the optimal solution of a two-variable integer program;(2) Determining...
详细信息
In this paper we consider the problem of generating random permutations on small parallel machines. The machines that we have in mind are shared memory machines with a constant number of processors such as the Sequent...
详细信息
ISBN:
(纸本)0897913701
In this paper we consider the problem of generating random permutations on small parallel machines. The machines that we have in mind are shared memory machines with a constant number of processors such as the Sequent Symmetry. We describe a parallel implementation of the 'shuffling' algorithm for generating a random permutation. If the hardware operates in a fair manner, this algorithm generates a fully random permutation. However, if the machine resolves contention in a malicious manner, then the algorithm does not generate permutations uniformly. We give almost tight bounds on the degree that an adversary can reduce the randomness. We also discuss the cost of locking data in the algorithm and present a method of generating random permutations with substantially reduced locking cost.
This paper deals with the problem of parallel construction of trees with optimal weighted path length. We study both the unordered case, known as the Huffman coding problem and the ordered case known as the optimal al...
详细信息
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dyn...
详细信息
ISBN:
(纸本)9781595936677
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynamic programming is typically called nonserial polyadic dynamic programming. Owing to the: non-uniform data dependence;it is harder to optimize this problem for parallelism and locality on parallelarchitectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with the technique. We formulate and analytically solve the optimization problem determing the the size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.
In this paper we present a simple parallel sorting algorithm and illustrate two applications. The algorithm (called the (l, m)-merge sort (LMM)) is an extension of the bitonic and odd-even merge sorts. Literature on p...
详细信息
In this paper we present a simple parallel sorting algorithm and illustrate two applications. The algorithm (called the (l, m)-merge sort (LMM)) is an extension of the bitonic and odd-even merge sorts. Literature on parallel sorting is abundant. Many of the algorithms proposed, though being theoretically important, may not perform satisfactorily in practice owing to large constants in their time bounds. The algorithm to be presented in this paper, due partly to its simplicity, results in small constants. We present an implementation for the parallel disk sorting problem. The algorithm is asymptotically optimal (assuming that N is a polynomial in M, where N is the number of records to be sorted and M is the internal memory size). The underlying constant is very small. This algorithm has a better performance than the disk-striped mergesort (DSM) algorithm when the number of disks is large. Our implementation is as simple as that of DSM (requiring no fancy data structures or prefetch techniques). As a second application, we prove that we can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni. We also show that Leighton's columnsort algorithm is a special case of LMM.
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under unc...
详细信息
ISBN:
(纸本)159593667X
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under uncertainty;scheduling DAGs on asynchronous processors;scheduling to minimize gaps and power consumption;cache-oblivious streaming B-trees;an experimental comparison of cache-oblivious and cache-conscious programs;scheduling threads for constructive cache sharing on CMPs;proximity-aware directory-based coherence for multi-core processor architectures;a parallel dynamic programming algorithm on a multi-core architecture;tight bounds for distributed selection;local MST computation with short advice;distributed approximation of capacitated dominating sets;packing to angles and sectors;and the notion of a timed register and its application to indulgent synchronization.
The proceedings contain 52 papers. The topics discussed include: randomized approximate nearest neighbor search with limited adaptivity;encoding short ranges in TCAM without expansion: efficient algorithm and applicat...
ISBN:
(纸本)9781450342100
The proceedings contain 52 papers. The topics discussed include: randomized approximate nearest neighbor search with limited adaptivity;encoding short ranges in TCAM without expansion: efficient algorithm and applications;extending the nested parallel model to the nested dataflow model with provably efficient schedulers;latency-hiding work stealing: scheduling interacting parallel computations with work stealing;provably good and practically efficient parallel race detection for fork-join programs;dynamic determinacy race detection for task parallelism with futures;RUBIC: online parallelism tuning for co-located transactional memory applications;investigating the performance of hardware transactions on a multi-socket machine;parallelalgorithms for asymmetric read-write costs;general profit scheduling and the power of migration on heterogeneous machines;the power of migration in online machine minimization;fair online scheduling for selfish jobs on heterogeneous machines;scheduling parallelizable jobs online to minimize the maximum flow time;clairvoyant dynamic bin packing for job scheduling with minimum server usage time;parallel equivalence class sorting: algorithms, lower bounds, and distribution-based analysis;and parallel approaches to the string matching problem on the GPU.
We present parallelalgorithms for union, intersection and difference on ordered sets using random balanced binary trees (treaps [26]). For two sets of size n and m (m ≤ n) the algorithms run in expected O(m lg(n/m))...
详细信息
ISBN:
(纸本)9780897919890
We present parallelalgorithms for union, intersection and difference on ordered sets using random balanced binary trees (treaps [26]). For two sets of size n and m (m ≤ n) the algorithms run in expected O(m lg(n/m)) work and O(lg n) depth (parallel time) on an EREW PRAM with scan operations (implying O(lg2 n) depth on a plain EREW PRAM). As with the sequential algorithms on treaps for insertion and deletion the main advantage of our algorithms are their simplicity. In fact our algorithms for set operations seem simpler than previous sequential algorithms with the same work bounds, and might therefore also be useful in a sequential context. To analyze the effectiveness of the algorithms we implemented both sequential and parallel versions of the algorithms and ran several experiments on them. Our parallel implementation uses the Cilk [5] shared memory runtime system on a 16 processor SGI Power Challenge and a 6 processor Sun Ultra Enterprise 3000. It shows reasonable speedup: 6.3 to 6.8 speedup on 8 processors of the SGI, and 4.1 to 4.4 speedup on 5 processors of the Sun.
The proceedings contain 43 papers. The topics discussed include: speed scaling of processes with arbitrary speedup curves on a multiprocessor;the bell is ringing in speed-scaled multiprocessor scheduling;mapping filte...
ISBN:
(纸本)9781605586069
The proceedings contain 43 papers. The topics discussed include: speed scaling of processes with arbitrary speedup curves on a multiprocessor;the bell is ringing in speed-scaled multiprocessor scheduling;mapping filtering streaming applications with communication costs;scheduling to minimize staleness and stretch in real-time data warehouses;parameterized maximum and average degree approximation in topic-based publish-subscribe overlay network design;selfishness in transactional memory;at-most-once semantics in asynchronous shared memory;memory models: a case for rethinking parallel languages and hardware;the life and times of a ZooKeeper;Cassandra - a structured storage system on a P2P network;Pregel: a system for large-scale graph processing;towards transactional memory semantics for C++;on avoiding spare aborts in transactional memory;inherent limitations on disjoint-access parallel implementations of transactional memory;reducers and other Cilk++ hyperobjects;and beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures.
In dynamically multithreaded platforms that employ work stealing, there appears to be a fundamental tradeoff between providing provably good time and space bounds and supporting SP-reciprocity, the property of allowin...
详细信息
ISBN:
(纸本)9781450300797
In dynamically multithreaded platforms that employ work stealing, there appears to be a fundamental tradeoff between providing provably good time and space bounds and supporting SP-reciprocity, the property of allowing arbitrary calling between parallel and serial code, including legacy serial binaries. Many known dynamically multithreaded platforms either fail to support SP-reciprocity or sacrifice on the provable time and space bounds that an efficient work-stealing scheduler could otherwise guarantee We describe PR-Cilk, a design of a runtime system that supports SP-reciprocity in Cilk and provides provable bounds on time and space In order to maintain the space bound, PR-Cilk uses subtree-restricted work stealing. We show that with subtree-restricted work stealing. PR-Cilk provides the same guarantee on stack space usage as ordinary Cilk. The completion time guaranteed by PR-Cilk is slightly worse than ordinary Cilk Nevertheless, if the number of times a C function calls a Cilk function is small, or if each Cilk function called by a C function is sufficiently parallel. PR-Cilk still guarantees linear speedup.
暂无评论