Decentralized optimization algorithms have received much attention due to the recent advances in network information processing. However, conventional decentralized algorithms based on projected gradient descent are i...
详细信息
Decentralized optimization algorithms have received much attention due to the recent advances in network information processing. However, conventional decentralized algorithms based on projected gradient descent are incapable of handling high-dimensional constrained problems, as the projection step becomes computationally prohibitive. To address this problem, this paper adopts a projection-free optimization approach, a. k. a. the Frank-Wolfe (FW) or conditional gradient algorithm. We first develop a decentralized FW (DeFW) algorithm from the classical FW algorithm. The convergence of the proposed algorithm is studied by viewing the decentralized algorithm as an inexact FW algorithm. Using a diminishing step size rule and letting t be the iteration number, we show that the DeFW algorithm's convergence rate is O(1/t) for convex objectives;is O(1/t(2)) for strongly convex objectives with the optimal solution in the interior of the constraint set;and is O(1/root t) toward a stationary point for smooth but nonconvex objectives. We then show that a consensus-based DeFW algorithm meets the above guarantees with two communication rounds per iteration. We demonstrate the advantages of the proposed DeFW algorithm on low-complexity robust matrix completion and communicationefficient sparse learning. Numerical results on synthetic and real data are presented to support our findings.
Over the last fifteen years, a large group of algorithms emerged which compute various predicates from distributed data with a focus on communication efficiency. These algorithms are often called "communication-e...
详细信息
ISBN:
(纸本)9781450336178
Over the last fifteen years, a large group of algorithms emerged which compute various predicates from distributed data with a focus on communication efficiency. These algorithms are often called "communication-efficient", "geometric-monitoring", or "local" algorithms. We jointly call them distributed convex thresholding algorithms, for reasons which will be explained in this work. Distributed convex thresholding algorithms have found their applications in domains in which bandwidth is a scarce resource, such as wireless sensor networks and peer-to-peer systems, or in scenarios in which data rapidly streams to the different processors but outcome of the predicate rarely changes. Common to all of these algorithms is the use of a data dependent criteria to determine when further messaging is required. This work presents two very simple yet exceedingly general theorems from which the correctness of all distributed convex thresholding algorithms can be elicited, and demonstrates that for key examples. Because the theorems are general, they extend the range of predicates which can be computed in a communicationefficient manner beyond what is currently known. Unlike the previous correction proofs given to these algorithms, the proofs of the theorems presented here do not depend on the communication infrastructure. So the correctness of any distributed convex thresholding algorithm is immediately extended from broadcast enabled networks or from cycle free networks to general networks. Inspecting existing algorithms in light of the new theorems reveals that they contain redundant requirements, which cause them to send messages when indeed none are needed.
We consider the problem of sampling n numbers from the range {1,..., N} without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache ef...
详细信息
We consider the problem of sampling n numbers from the range {1,..., N} without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time O(n/p + log p) on p processors, i.e., scales to massively parallel machines even for moderate values of n. The amount of communication between the processors is very small (at most O(log p)) and independent of the sample size. We also discuss modifications needed for load balancing, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs.
暂无评论