We propose distributed algorithms for high-dimensional sparse optimization. In many applications, the parameter is sparse but high-dimensional. This is pathological for existing distributed algorithms as the latter re...
详细信息
ISBN:
(纸本)9781479999880
We propose distributed algorithms for high-dimensional sparse optimization. In many applications, the parameter is sparse but high-dimensional. This is pathological for existing distributed algorithms as the latter require an information exchange stage involving transmission of the full parameter, which may not be sparse during the intermediate steps of optimization. The novelty of this work is to develop communication efficient algorithms using the stochastic Frank-Wolfe (sFW) algorithm, where the gradient computation is inexact but controllable. For star network topology, we propose an algorithm with low communication cost and establishes its convergence. The proposed algorithm is then extended to perform decentralized optimization on general network topology. Numerical experiments are conducted to verify our findings.
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps. We give efficient, fast...
详细信息
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps. We give efficient, fast, and practicable parallel and distributed algorithms for building data structures that support sampling single items (alias tables, compressed data structures). This also yields a simplified and more space-efficient sequential algorithm for alias table construction. Our approaches to sampling k out of n items with/without replacement and to subset (Poisson) sampling are output-sensitive, i.e., the sampling algorithms use work linear in the number of different samples. This is also interesting in the sequential case. Weighted random permutation can be done by sorting appropriate random deviates. We show that this is possible with linear work. Finally, we give a communication-efficient, highly scalable approach to (weighted and unweighted) reservoir sampling. This algorithm is based on a fully distributed model of streaming algorithms that might be of independent interest. Experiments for alias tables and sampling with replacement show near linear speedups using up to 158 threads of shared-memory machines. An experimental evaluation of distributed weighted reservoir sampling on up to 5,120 cores also shows good speedups.
Parallel algorithms have been a subject of intensive algorithmic research in the 1980s. This research almost died out in the mid 1990s. In this paper we argue that it is high time to reconsider this subject since a lo...
详细信息
ISBN:
(纸本)9783939897781
Parallel algorithms have been a subject of intensive algorithmic research in the 1980s. This research almost died out in the mid 1990s. In this paper we argue that it is high time to reconsider this subject since a lot of things have changed. First and foremost, parallel processing has moved from a niche application to something mandatory for any performance critical computer applications. We will also point out that even very fundamental results can still be obtained. We give examples and also formulate some open problems.
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory a...
详细信息
ISBN:
(纸本)9783959771245
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries.
暂无评论