In this paper, we study the tradeoffs between the time and the number of communication rounds of the best arm identification problem in the heterogeneous collaborative learning model, where multiple agents interact wi...
详细信息
ISBN:
(纸本)9798400704161
In this paper, we study the tradeoffs between the time and the number of communication rounds of the best arm identification problem in the heterogeneous collaborative learning model, where multiple agents interact with possibly different environments and they want to learn in parallel an objective function in the aggregated environment. By proving almost tight upper and lower bounds, we show that collaborative learning in the heterogeneous setting is inherently more difficult than that in the homogeneous setting in terms of the time-round tradeoff.
We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. Given a batch of edge deletion or insertion updates to the graph, our...
详细信息
ISBN:
(纸本)9798400704161
We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. Given a batch of edge deletion or insertion updates to the graph, our parallel algorithm adjusts the maximal matching to these updates in poly(log n) depth and using poly(log n) amortized work per update. That is, the amortized work for processing a batch of k updates is k poly(log n), while all this work is done in poly(log n) depth, with high probability. This can be seen as a parallel counterpart of the sequential dynamic algorithms for constant-approximate and maximal matching [Onak and Rubinfeld STOC'10;Baswana, Gupta, and Sen FOCS'11;and Solomon FOCS'16]. Our algorithm readily generalizes to maximal matching in hypergraphs of rank r-where each hyperedge has at most r endpoints-with a poly(r) increase in work, while retaining the poly(log n) depth.
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency o...
详细信息
ISBN:
(纸本)9798400704161
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency on large-diameter graphs, which is a common challenge for many existing parallel graph processing systems due to the high overhead in synchronizing threads when traversing the graph in the breadth-first order. The core idea in PASGAL is a technique called vertical granularity control (VGC) to hide synchronization overhead by careful algorithm redesign and new data structures. We compare PASGAL with existing parallel implementations on several fundamental graph problems. PASGAL is always competitive on small-diameter graphs, and is significantly faster on large-diameter graphs.
We address optimal makespan-minimizing identical parallel machine scheduling (P vertical bar vertical bar C-max) by introducing new pruning rules for branch-and-bound (BnB) and integrating them into a prior BnB algori...
详细信息
ISBN:
(纸本)9798400704161
We address optimal makespan-minimizing identical parallel machine scheduling (P vertical bar vertical bar C-max) by introducing new pruning rules for branch-and-bound (BnB) and integrating them into a prior BnB algorithm. Experimental results indicate that the presented rules are inexpensive to evaluate, applicable frequently, and extremely beneficial to the BnB algorithm's overall performance.
作者:
Koo, JaehyunMIT
77 Massachusetts Ave Cambridge MA 02139 USA
We present an O(1)-round fully-scalable deterministic massively parallel algorithm for computing the min-plus matrix multiplication of unit-Monge matrices. We use this to derive a O(log n)-round fully-scalable massive...
详细信息
ISBN:
(纸本)9798400704161
We present an O(1)-round fully-scalable deterministic massively parallel algorithm for computing the min-plus matrix multiplication of unit-Monge matrices. We use this to derive a O(log n)-round fully-scalable massively parallel algorithm for solving the exact longest increasing subsequence (LIS) problem. For a fully-scalable MPC regime, this result substantially improves the previously known algorithm of O(log(4) n)-round complexity, and matches the best algorithm for computing the (1 + epsilon)-approximation of LIS.
The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challeng...
详细信息
ISBN:
(纸本)9798400704161
The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challenging, and open to a (nearly) work-efficient solution (i.e., the work is off by at most a polylogarithmic factor over the best sequential solution). In fact, sequential DP algorithms employ many advanced optimizations such as decision monotonicity or special data structures, and achieve better work than straightforward solutions. Many such optimizations are inherently sequential, which creates extra challenges for a parallel algorithm to achieve the same work bound. The goal of this paper is to achieve (nearly) work-efficient parallel DP algorithms by parallelizing classic, highly-optimized and practical sequential algorithms. We show a general framework called the Cordon Algorithm for parallel DP algorithms, and use it to solve several classic problems. Our selection of problems includes Longest Increasing Subsequence (LIS), sparse Longest Common Subsequence (LCS), convex/concave generalized Least Weight Subsequence (LWS), Optimal Alphabetic Tree (OAT), and more. We show how the Cordon Algorithm can be used to achieve the same level of optimization as the sequential algorithms, and achieve good parallelism. Many of our algorithms are conceptually simple, and we show some experimental results as proofs-of-concept.
Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications. While shared-memory parallel partitioners have r...
详细信息
ISBN:
(纸本)9798400704161
Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications. While shared-memory parallel partitioners have recently matured to achieve the same quality as widely used sequential partitioners, there is still a pronounced quality gap between distributed partitioners and their sequential counterparts. In this work, we shrink this gap considerably by describing the engineering of an unconstrained local search algorithm suitable for distributed partitioners. We integrate the proposed algorithm in a distributed multilevel partitioner. Our extensive experiments show that the resulting algorithm scales to thousands of PEs while computing cuts that are, on average, only 3.5% larger than those of a state-of-the-art high-quality shared-memory partitioner. Compared to previous distributed partitioners, we obtain on average 6.8% smaller cuts than the best-performing competitor while being more than 9 times faster.
Dynamic trees are a well-studied and fundamental building block of dynamic graph algorithms dating back to the seminal work of Sleator and Tarjan [STOC'81, (1981), pp. 114-122]. The problem is to maintain a tree s...
详细信息
ISBN:
(纸本)9798400704161
Dynamic trees are a well-studied and fundamental building block of dynamic graph algorithms dating back to the seminal work of Sleator and Tarjan [STOC'81, (1981), pp. 114-122]. The problem is to maintain a tree subject to online edge insertions and deletions while answering queries about the tree, such as the heaviest weight on a path, etc. In the parallel batch-dynamic setting, the goal is to process batches of edge updates work efficiently in low (polylog n) span. Two work-efficient algorithms are known: batch-parallel Euler Tour Trees by Tseng et al. [ALENEX'19, (2019), pp. 92-106] and parallel Rake-Compress (RC) Trees by Acar et al. [ESA'20, (2020), pp. 2:1-2:23]. Both however are randomized and work efficient in expectation. Several downstream results that use these data structures (and indeed to the best of our knowledge, all known workefficient parallel batch-dynamic graph algorithms) are therefore also randomized. In this work, we give the first deterministic work-efficient solution to the problem. Our algorithm maintains a parallel RC-Tree on n vertices subject to batches of k edge updates deterministically in worst-case O(k log(1 + n/k)) work and O(log n log log k) span on the Common-CRCW PRAM. We also show how to improve the span of the randomized algorithm from O(log n log* n) to O(log n). Lastly, as a result of our new deterministic algorithm, we also derandomize several downstream results that make use of parallel batch-dynamic dynamic trees, previously for which the only efficient solutions were randomized.
In a breakthrough result, Spielman and Teng (2004) developed a nearly-linear time solver for Laplacian linear equations, i.e. equations where the coefficient matrix is symmetric with non-negative diagonals and zero ro...
详细信息
ISBN:
(纸本)9798400704161
In a breakthrough result, Spielman and Teng (2004) developed a nearly-linear time solver for Laplacian linear equations, i.e. equations where the coefficient matrix is symmetric with non-negative diagonals and zero rowsums. Since the development of the Spielman-Teng solver, there has been substantial progress, simplifying and improving their result, but obtaining a fast practical, parallel Laplacian solver remains an open problem. We present a framework for obtaining extremely simple, parallel Laplacian linear equation solvers with nearly-linear work and sub-linear depth. Our framework allows us to parallelize any Laplacian solver based on repeated single-vertex approximate Gaussian elimination. We demonstrate this by parallelizing both the algorithm of Kyng and Sachdeva (2016) and the practical variant by Gao, Kyng, and Spielman (2023). Our framework is work-efficient in the sense of matching the sequential work of these algorithms. Our parallelization framework is very simple: We sample a subset of the current low-degree vertices (sparse columns), and in parallel we eliminate all vertices that are isolated in the resulting induced subgraph. This approach can be combined with any parallelizable approximate single-vertex elimination subroutine with sparse output. Given the simplicity of the approach, we believe that using it to parallelize the solver of Gao, Kyng, and Spielman (2023) is the most promising direction for obtaining practical parallel Laplacian solvers. If we additionally use a parallel spectral sparsification routine, our approach can be modified to work in polylogarithmic depth and nearly-linear work.
Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree), the SLD of) is a binary dendrogram that summarizes the n =...
详细信息
ISBN:
(纸本)9798400704161
Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree), the SLD of) is a binary dendrogram that summarizes the n = 1 clusterings obtained by contracting the edges of T in order of weight. Existing algorithms for computing the SLD all require Omega(n log n) work where n = vertical bar T vertical bar. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem. In this paper, we design faster parallelalgorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires O(n log h) work and O(log(2) n log(2) h) depth, where h is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest-neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves O(n log h) work and O(h log n) depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice.
暂无评论