One obvious question is if there exists an online algorithm that is O(1)-competitive with speed less than two or not. To obtain such an algorithm (if exists), one must exploit the structure of cost functions. Our anal...
详细信息
ISBN:
(纸本)9781611972108
One obvious question is if there exists an online algorithm that is O(1)-competitive with speed less than two or not. To obtain such an algorithm (if exists), one must exploit the structure of cost functions. Our analysis can be extended to show that there exists an O(1)-speed O(1)-competitive algorithm on identical parallel machines.
We show how to compute the width of a dynamic set of low-dimensional points in the streaming model. In particular, we assume the stream contains both insertions of points and deletions of points to a set S, and the go...
详细信息
ISBN:
(纸本)9781611972108
We show how to compute the width of a dynamic set of low-dimensional points in the streaming model. In particular, we assume the stream contains both insertions of points and deletions of points to a set S, and the goal is to compute the width of the set S, namely the minimal distance between two parallel lines sandwiching the pointset S. Our algorithm 1 + ε approximates the width of the set S using space polylogarithmic in the size of S and the aspect ratio of S. This is the first such algorithm that supports both insertions and deletions of points to the set S: previous algorithms for approximating the width of a pointset only supported additions [AHPV04, Cha06], or a sliding window [CS06]. This solves an open question from the "2009 Kanpur list" of Open Problems in Data Streams, Property Testing, and Related Topics [IMNO11].
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer's work...
详细信息
ISBN:
(纸本)9781450307437
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer's workflow" and targeting XMT, a PRAM-inspired many-core architecture, we achieve significantly higher speed-ups than previous approaches. Comparison with the fastest known serial max-flow implementation on a modern CPU demonstrates for the first time potential for orders-of-magnitude performance improvement for Max-Flow. Using XMT, the PRAM Max-Flow algorithm is also much easier to program than for other parallel platforms, contributing a powerful example toward dual validation of both PRAM algorithmics and XMT.
We present parallel greedy approximation algorithms for set cover and related problems. These algorithms build on an algorithm for solving a graph problem we formulate and study called Maximal Nearly Independent Set (...
详细信息
ISBN:
(纸本)9781450307437
We present parallel greedy approximation algorithms for set cover and related problems. These algorithms build on an algorithm for solving a graph problem we formulate and study called Maximal Nearly Independent Set (MANIS)-a graph abstraction of a key component in existing work on parallel set cover. We derive a randomized algorithm for MANIS that has O(m) work and O(log(2)m) depth on input with m edges. Using MANIS, we obtain RNC algorithms that yield a (1 + epsilon)H-n-approximation for set cover, a (1 - 1/e - epsilon)-approximation for max cover and a (4 + epsilon)-approximation for min-sum set cover all in linear work;and an O(log*n)-approximation for asymmetric k-center for k <= log(O(1)) n and a (1.861 + epsilon)-approximation for metric facility location both in essentially the same work bounds as their sequential counterparts.
As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asympto...
详细信息
ISBN:
(纸本)9781450307437
As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asymptotic communication lower bounds for general memory models and classes of algorithms. Recent work [2] has established lower bounds for a wide set of linear algebra algorithms on a sequential machine and on a parallel machine with identical processors. This work extends these previous bounds to a heterogeneous model in which processors access data and perform floating point operations at differing speeds. We also present an algorithm for dense matrix multiplication which attains the lower bound.
parallelalgorithms and dynamic algorithms possess an interesting duality property: compared to sequential algorithms, parallelalgorithms improve run-time while preserving work, while dynamic algorithms improve work ...
详细信息
ISBN:
(纸本)9781450307437
parallelalgorithms and dynamic algorithms possess an interesting duality property: compared to sequential algorithms, parallelalgorithms improve run-time while preserving work, while dynamic algorithms improve work but typically offer no parallelism.. Although they are often considered separately, parallel and dynamic algorithms employ similar design techniques. They both identify parts of the computation that are independent of each other. This suggests that dynamic algorithms could be parallelized to improve work efficiency while preserving fast parallel run-time. In this paper, we parallelize a dynamic algorithm for well-spaced point sets, an important problem related to mesh refinement in computational geometry. Our parallel dynamic algorithm computes a well-spaced superset of a dynamically changing set of points, allowing arbitrary dynamic modifications to the input set. On an EREW PRAM, our algorithm processes batches of k modifications such as insertions and deletions in O(k log Delta) total work and in O(log Delta) parallel time using k processors, where Delta is the geometric spread of the data, while ensuring that the output is always within a constant factor of the optimal size. EREW PRAM model is quite different from actual hardware such as modern multiprocessors. We therefore describe techniques for implementing our algorithm on modern multi-core computers and provide a prototype implementation. Our empirical evaluation shows that our algorithm can be practical, yielding a large degree of parallelism and good speedups.
The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch...
ISBN:
(纸本)9781450307437
The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch subgraphs;linear-work greedy parallel approximate set cover and variants;optimizing hybrid transactional memory: the importance of nonspeculative operations;parallelism and data movement characterization of contemporary application classes;work-stealing for mixed-mode parallelism by deterministic team-building;full reversal routing as a linear dynamical system;reclaiming the energy of a schedule, models and algorithms;a tight runtime bound for synchronous gathering of autonomous robots with limited visibility;convergence of local communication chain strategies via linear transformations: or how to trade locality for speed;and convergence to equilibrium of logit dynamics for strategic games.
This paper presents the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD n-by-n matrix A with m non-zero entries and a vect...
详细信息
ISBN:
(纸本)9781450307437
This paper presents the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD n-by-n matrix A with m non-zero entries and a vector h, our algorithm computes a vector (x) over tilde;such that parallel to(x) over tilde - A(+)b parallel to A <= epsilon . parallel to A(+)b parallel to A in O(m log(O(1)) n log 1/epsilon) work and O(m(1/3+theta) log 1/epsilon) depth for any fixed theta > 0. The algorithm relies on a parallel algorithm for generating low-stretch spanning trees or spanning subgraphs. To this end, we first develop a parallel decomposition algorithm that in polylogarithmic depth and (O) over tilde vertical bar E vertical bar) work(1), partitions a graph into components with polylogarithmic diameter such that only a small fraction of the original edges are between the components. This can be used to generate low-stretch spanning trees with average stretch O(n(alpha)) in O(n(1+alpha)) work and O(n(alpha)) depth. Alternatively, it can be used to generate spanning subgraphs with polylogarithmic average stretch in O(vertical bar E vertical bar) work and polylogarithmic depth. We apply this subgraph construction to derive our solver. By using the linear system solver in known applications, our results imply improved parallel randomized algorithms for several problems, including single-source shortest paths, maximum flow, min-cost flow, and approximate max-flow.
A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" a...
详细信息
ISBN:
(纸本)9781450307437
A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallel-loop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency.
暂无评论