The proceedings contain 47 papers. The topics discussed include: Quancurrent: a concurrent quantiles sketch;an efficient scheduler for task-parallel interactive applications;efficient synchronization-light work steali...
ISBN:
(纸本)9781450395458
The proceedings contain 47 papers. The topics discussed include: Quancurrent: a concurrent quantiles sketch;an efficient scheduler for task-parallel interactive applications;efficient synchronization-light work stealing;balanced allocations in batches: the tower of two choices;massively parallel tree embeddings for high dimensional spaces;deterministic massively parallel symmetry breaking for sparse graphs;an associativity threshold phenomenon in set-associative caches;increment - and - freeze: every cache, everywhere, all of the time;multidimensional approximate agreement with asynchronous fallback;a tight characterization of fast failover routing: resiliency to two link failures is possible;releasing memory with optimistic access: a hybrid approach to memory reclamation and allocation in lock-free programs;transactional composition of nonblocking data structures;applying hazard pointers to more concurrent data structures;and nearly optimal parallelalgorithms for longest increasing subsequence.
In the Massively parallel Computation (MPC) model, data is distributed across multiple processors, and we call an algorithm space-efficient if each machine has n(1-epsilon+o(1)) memory with a machine count of Omega(n(...
详细信息
ISBN:
(纸本)9798400704161
In the Massively parallel Computation (MPC) model, data is distributed across multiple processors, and we call an algorithm space-efficient if each machine has n(1-epsilon+o(1)) memory with a machine count of Omega(n(epsilon)). In this paper, we study the string edit distance problem in the MPC model, presenting both a new algorithm and lower-bound results. A space-efficient MPC algorithm computing the exact edit distance using O(n(epsilon)) communication rounds is known by updating the algorithm of Chowdhury and Ramachandran (SPAA 2008). A key contribution of our work is the introduction of the first space-efficient MPC algorithm, which uses subpolynomial number of rounds and provides an n(o(1))-approximation of edit distance, where.. denotes the length of the input strings. Further, we complement this algorithm with new lower-bound results. The Orthogonal Vector (O.V.) conjecture states that no truly subquadratic time algorithm exists for the Orthogonal Vector problem, and it follows directly from the Strong Exponential Time Hypothesis (SETH). Drawing inspiration from this, we propose the Strong O.V. Conjecture that posits that there is no space-efficient MPC algorithm capable of solving O.V. using n(epsilon-Omega(1)) communication rounds. The Strong O.V. conjecture has far-reaching consequences, yielding the first (or strengthened) lower bounds for a myriad of problems in the MPC model including graph diameter estimation, computing Frechet distance, longest common subsequence, and dynamic time warping. Via an MPC reduction from O.V. to edit distance, we give the first conditional lower bound for string edit distance in the MPC model showing that there does not exist any space-efficient, n(epsilon-Omega(1))-round MPC exact edit distance algorithm.
The growing interest in novel dataflow architectures and streaming execution paradigms has created the need for a simulator optimized for modeling dataflow systems. To fill this need, we present three new techniques t...
详细信息
ISBN:
(纸本)9798350326598;9798350326581
The growing interest in novel dataflow architectures and streaming execution paradigms has created the need for a simulator optimized for modeling dataflow systems. To fill this need, we present three new techniques that make it feasible to simulate complex systems consisting of thousands of components. First, we introduce an interface based on Communicating Sequential Processes which allows users to simultaneously describe functional and timing characteristics. Second, we introduce a scalable point-to-point synchronization scheme that avoids global synchronization. Finally, we demonstrate a technique to exploit slack in the simulated system, such as FIFOs, to increase simulation parallelism. We implement these techniques in the Dataflow Abstract Machine (DAM), a parallel simulator framework for dataflow systems. We demonstrate the benefits of using DAM by highlighting three case studies using the framework. First, we use DAM directly as an exploration tool for streaming algorithms on dataflow hardware. We simulate two different implementations of the attention algorithm used in large language models, and use DAM to show that the second implementation only requires a constant amount of local memory. Second, we re-implement a simulator for a sparse tensor algebra accelerator, resulting in 57% less code and a simulation speedup of up to four orders of magnitude. Finally, we demonstrate a general technique for time-multiplexing real hardware to simulate multiple virtual copies of the hardware using DAM.
The proceedings contain 44 papers. The topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth ...
ISBN:
(纸本)9781450391467
The proceedings contain 44 papers. The topics discussed include: deterministic distributed sparse and ultra-sparse spanners and connectivity certificates;fully polynomial-time distributed computation in low-treewidth graphs;adaptive massively parallelalgorithms for cut problems;preparing for disaster: leveraging precomputation to efficiently repair graph structures upon failures;the energy complexity of Las Vegas leader election;a fully-distributed peer-to-peer protocol for byzantine-resilient distributed hash tables;brief announcement: the (limited) power of multiple identities: asynchronous byzantine reliable broadcast with improved resilience through collusion;brief announcement: composable dynamic secure emulation;and robust and optimal contention resolution without collision detection.
The proceedings contain 49 papers. The topics discussed include: fast stencil computations using fast Fourier transforms;low-span parallelalgorithms for the binary-forking model;provable advantages for graph algorith...
ISBN:
(纸本)9781450380706
The proceedings contain 49 papers. The topics discussed include: fast stencil computations using fast Fourier transforms;low-span parallelalgorithms for the binary-forking model;provable advantages for graph algorithms in spiking neural networks;algorithms for right-sizing heterogeneous data centers;efficient parallel self-adjusting computation;speed scaling with explorable uncertainty;efficient online weighted multi-level paging;paging and the address-translation problem;massively parallelalgorithms for distance approximation and spanners;efficient load-balancing through distributed token dropping;finding subgraphs in highly dynamic networks;near-optimal time-energy trade-offs for deterministic leader election;and efficient stepping algorithms and implementations for parallel shortest paths.
The proceedings contain 68 papers. The topics discussed include: faster deterministic all pairs shortest paths in congest model;contention resolution with message deadlines;memory tagging: minimalist synchronization f...
ISBN:
(纸本)9781450369350
The proceedings contain 68 papers. The topics discussed include: faster deterministic all pairs shortest paths in congest model;contention resolution with message deadlines;memory tagging: minimalist synchronization for scalable concurrent data structures;work-efficient batch-incremental minimum spanning trees with applications to the sliding-window model;closing the gap between cache-oblivious and cache-adaptive analysis;optimal resource allocation for elastic and inelastic jobs;optimal parallelalgorithms in the binary-forking model;randomized incremental convex hull is highly parallel;almost universal anonymous rendezvous in the plane;the online multi-commodity facility location problem;unconditional lower bounds for adaptive massively parallel computation;on the hardness of massively parallel computation;and graph sparsification for derandomizing massively parallel computation with low space.
The proceedings contain 45 papers. The topics discussed include: the price of clustering in bin-packing with applications to bin-packing with delays;faster matrix multiplication via sparse decomposition;NC algorithms ...
ISBN:
(纸本)9781450361842
The proceedings contain 45 papers. The topics discussed include: the price of clustering in bin-packing with applications to bin-packing with delays;faster matrix multiplication via sparse decomposition;NC algorithms for computing a perfect matching, the number of perfect matchings, and a maximum flow in one-crossing-minor-free graphs;improved MPC algorithms for edit distance and ulam distance;brief announcement: scalable diversity maximization via small-size composable core-sets;brief announcement: eccentricities via parallel set cover;dynamic algorithms for the massively parallel computation model;massively parallel computation via remote memory access;and brief announcement: ultra-fast asynchronous randomized rumor spreading.
A new block algorithm for triangularization of regular or singular matrices with dimension m × n is proposed. Taking benefit of fast block multiplication algorithms, it achieves the best known sequential complexi...
ISBN:
(纸本)9781581134094
A new block algorithm for triangularization of regular or singular matrices with dimension m × n is proposed. Taking benefit of fast block multiplication algorithms, it achieves the best known sequential complexity Ο(mw-1n) for any sizes and any rank. Moreover, the block strategy enables to improve locality with respect to previous algorithms as exhibited by practical performances.
The proceedings contain 47 papers. The topics discussed include: parallel minimum cuts in near-linear work and low depth;trees for vertex cuts, hypergraph cuts and minimum hypergraph bisection;dynamic representations ...
ISBN:
(纸本)9781450357999
The proceedings contain 47 papers. The topics discussed include: parallel minimum cuts in near-linear work and low depth;trees for vertex cuts, hypergraph cuts and minimum hypergraph bisection;dynamic representations of sparse distributed networks: a locality-sensitive approach;constant-depth and subcubic-size threshold circuits for matrix multiplication;integrated model, batch, and domain parallelism in training neural networks;brief announcement: on approximating pagerank locally with sublinear query complexity;brief announcement: coloring-based task mapping for dragonfly systems;brief announcement: parallel transitive closure within 3D crosspoint memory;and lock-free contention adapting search trees.
We provide a tight analysis which settles the round complexity of the well-studied parallel randomized greedy MIS algorithm, thus answering the main open question of Blelloch, Fineman, and Shun [SPAA'12]. The para...
详细信息
ISBN:
(纸本)9781611975031
We provide a tight analysis which settles the round complexity of the well-studied parallel randomized greedy MIS algorithm, thus answering the main open question of Blelloch, Fineman, and Shun [SPAA'12]. The parallel/distributed randomized greedy Maximal Independent Set (MIS) algorithm works as follows. An order of the vertices is chosen uniformly at random. Then, in each round, all vertices that appear before their neighbors in the order are added to the independent set and removed from the graph along with their neighbors. The main question of interest is the number of rounds it takes until the graph is empty. This algorithm has been studied since 1987, initiated by Coppersmith, Raghavan, and Tompa [FOCS'87], and the previously best known bounds were O(log n) rounds in expectation for Erdos-Renyi random graphs by Calkin and Frieze [Random Struc. & Alg. '90] and O(log(2) n) rounds with high probability for general graphs by Blelloch, Fineman, and Shun [SPAA'12]. We prove a high probability upper bound of O(log n) on the round complexity of this algorithm in general graphs, and that this bound is tight. This also shows that parallel randomized greedy MIS is as fast as the celebrated algorithm of Luby [STOC'85, JALG'86].
暂无评论