We show that the several problems, whose complexity with respect P and NC was open, are equivalent under NC-reductions. These include: (1) Finding the optimal solution of a two-variable integer program;(2) Determining...
详细信息
The problem of representing a set U = {u1, ..., um} of read-write variables on an n-node distributed memory parallel computer is considered. It is shown that U can be represented among the n nodes of a variant of the ...
详细信息
ISBN:
(纸本)0897913701
The problem of representing a set U = {u1, ..., um} of read-write variables on an n-node distributed memory parallel computer is considered. It is shown that U can be represented among the n nodes of a variant of the mesh-of-trees using O((m/n)polylog(m/n)) storage per node such that any n-tuple of variables may be accessed in O(log n(log log n)2) time in the worst case for m polynomial in n.
As databases increasingly integrate non-textual information it is becoming necessary to support efficient similarity searching in addition to range searching. Recently, declustering techniques have been proposed for i...
详细信息
ISBN:
(纸本)9780897919890
As databases increasingly integrate non-textual information it is becoming necessary to support efficient similarity searching in addition to range searching. Recently, declustering techniques have been proposed for improving the performance of similarity searches through parallel I/O. In this paper, we propose a new scheme which provides good declustering for similarity searching. In particular, it does global declustering as opposed to local declustering, exploits the availability of extra disks and does not limit the partitioning of the data space. Our technique is based upon the Cyclic declustering schemes which were developed for range and partial match queries. We establish, in general, that Cyclic declustering techniques outperform previously proposed techniques.
In this paper we present a simple parallel sorting algorithm and illustrate two applications. The algorithm (called the (l, m)-merge sort (LMM)) is an extension of the bitonic and odd-even merge sorts. Literature on p...
详细信息
In this paper we present a simple parallel sorting algorithm and illustrate two applications. The algorithm (called the (l, m)-merge sort (LMM)) is an extension of the bitonic and odd-even merge sorts. Literature on parallel sorting is abundant. Many of the algorithms proposed, though being theoretically important, may not perform satisfactorily in practice owing to large constants in their time bounds. The algorithm to be presented in this paper, due partly to its simplicity, results in small constants. We present an implementation for the parallel disk sorting problem. The algorithm is asymptotically optimal (assuming that N is a polynomial in M, where N is the number of records to be sorted and M is the internal memory size). The underlying constant is very small. This algorithm has a better performance than the disk-striped mergesort (DSM) algorithm when the number of disks is large. Our implementation is as simple as that of DSM (requiring no fancy data structures or prefetch techniques). As a second application, we prove that we can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni. We also show that Leighton's columnsort algorithm is a special case of LMM.
We present parallelalgorithms for union, intersection and difference on ordered sets using random balanced binary trees (treaps [26]). For two sets of size n and m (m ≤ n) the algorithms run in expected O(m lg(n/m))...
详细信息
ISBN:
(纸本)9780897919890
We present parallelalgorithms for union, intersection and difference on ordered sets using random balanced binary trees (treaps [26]). For two sets of size n and m (m ≤ n) the algorithms run in expected O(m lg(n/m)) work and O(lg n) depth (parallel time) on an EREW PRAM with scan operations (implying O(lg2 n) depth on a plain EREW PRAM). As with the sequential algorithms on treaps for insertion and deletion the main advantage of our algorithms are their simplicity. In fact our algorithms for set operations seem simpler than previous sequential algorithms with the same work bounds, and might therefore also be useful in a sequential context. To analyze the effectiveness of the algorithms we implemented both sequential and parallel versions of the algorithms and ran several experiments on them. Our parallel implementation uses the Cilk [5] shared memory runtime system on a 16 processor SGI Power Challenge and a 6 processor Sun Ultra Enterprise 3000. It shows reasonable speedup: 6.3 to 6.8 speedup on 8 processors of the SGI, and 4.1 to 4.4 speedup on 5 processors of the Sun.
Currently, the best known tradeoff between approximation ratio and complexity for the Sparsest Cut problem is achieved by the algorithm in [Sherman, FOCS 2009]: it computes O(root(log n)/epsilon)-approximation using O...
详细信息
ISBN:
(纸本)9798400704161
Currently, the best known tradeoff between approximation ratio and complexity for the Sparsest Cut problem is achieved by the algorithm in [Sherman, FOCS 2009]: it computes O(root(log n)/epsilon)-approximation using O(n(epsilon) log(O(1))n) maxflows for any epsilon is an element of[Theta(1/log n), Theta(1)]. It works by solving the SDP relaxation of [Arora-Rao-Vazirani, STOC 2004] using the Multiplicative Weights Update algorithm (MW) of [Arora-Kale, Jacm 2016]. To implement one MW step, Sherman approximately solves a multicommodity flow problem using another application of MW. Nested MW steps are solved via a certain "chaining" algorithm that combines results of multiple calls to the maxflow algorithm. We present an alternative approach that avoids solving the multicommodity flow problem and instead computes "violating paths". This simplifies Sherman's algorithm by removing a need for a nested application of MW, and also allows parallelization: we show how to compute O(root(log n)/epsilon)-approximation via O(log(O(1)) n) maxflows using O(n(epsilon)) processors. We also revisit Sherman's chaining algorithm, and present a simpler version together with a new analysis.
The proceedings contain 52 papers. The topics discussed include: randomized approximate nearest neighbor search with limited adaptivity;encoding short ranges in TCAM without expansion: efficient algorithm and applicat...
ISBN:
(纸本)9781450342100
The proceedings contain 52 papers. The topics discussed include: randomized approximate nearest neighbor search with limited adaptivity;encoding short ranges in TCAM without expansion: efficient algorithm and applications;extending the nested parallel model to the nested dataflow model with provably efficient schedulers;latency-hiding work stealing: scheduling interacting parallel computations with work stealing;provably good and practically efficient parallel race detection for fork-join programs;dynamic determinacy race detection for task parallelism with futures;RUBIC: online parallelism tuning for co-located transactional memory applications;investigating the performance of hardware transactions on a multi-socket machine;parallelalgorithms for asymmetric read-write costs;general profit scheduling and the power of migration on heterogeneous machines;the power of migration in online machine minimization;fair online scheduling for selfish jobs on heterogeneous machines;scheduling parallelizable jobs online to minimize the maximum flow time;clairvoyant dynamic bin packing for job scheduling with minimum server usage time;parallel equivalence class sorting: algorithms, lower bounds, and distribution-based analysis;and parallel approaches to the string matching problem on the GPU.
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under unc...
详细信息
ISBN:
(纸本)159593667X
The proceedings contain 37 papers. The topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under uncertainty;scheduling DAGs on asynchronous processors;scheduling to minimize gaps and power consumption;cache-oblivious streaming B-trees;an experimental comparison of cache-oblivious and cache-conscious programs;scheduling threads for constructive cache sharing on CMPs;proximity-aware directory-based coherence for multi-core processor architectures;a parallel dynamic programming algorithm on a multi-core architecture;tight bounds for distributed selection;local MST computation with short advice;distributed approximation of capacitated dominating sets;packing to angles and sectors;and the notion of a timed register and its application to indulgent synchronization.
The proceedings contain 43 papers. The topics discussed include: speed scaling of processes with arbitrary speedup curves on a multiprocessor;the bell is ringing in speed-scaled multiprocessor scheduling;mapping filte...
ISBN:
(纸本)9781605586069
The proceedings contain 43 papers. The topics discussed include: speed scaling of processes with arbitrary speedup curves on a multiprocessor;the bell is ringing in speed-scaled multiprocessor scheduling;mapping filtering streaming applications with communication costs;scheduling to minimize staleness and stretch in real-time data warehouses;parameterized maximum and average degree approximation in topic-based publish-subscribe overlay network design;selfishness in transactional memory;at-most-once semantics in asynchronous shared memory;memory models: a case for rethinking parallel languages and hardware;the life and times of a ZooKeeper;Cassandra - a structured storage system on a P2P network;Pregel: a system for large-scale graph processing;towards transactional memory semantics for C++;on avoiding spare aborts in transactional memory;inherent limitations on disjoint-access parallel implementations of transactional memory;reducers and other Cilk++ hyperobjects;and beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures.
Transactional memory (TM) is a promising approach for designing concurrent data structures, and it is essential to develop better understanding of the formal properties that can be achieved by TM implementations. Two ...
详细信息
ISBN:
(纸本)9781605586069
Transactional memory (TM) is a promising approach for designing concurrent data structures, and it is essential to develop better understanding of the formal properties that can be achieved by TM implementations. Two fundamental properties of TM implementations are disjoint-access parallelism, which is critical for their scalability, and the invisibility of read operations, which reduces memory contention. This paper proves an inherent tradeoff for implementations of transactional memories: they cannot be both disjoint-access parallel and have read-only transactions that are invisible and always terminate successfully. In fact, a lower bound of Omega(t) is proved on the number of writes needed in order to implement a read-only transaction of t items, which successfully terminates in a disjoint-access parallel TM implementation. The results assume strict serializability and thus hold under the assumption of opacity. It is shown how to extend the results to hold also for weaker consistency conditions, serializability and snapshot isolation.
暂无评论