We study non-convexoptimization problems where the data is distributed across nodes of a time-varying directed network;this describes dynamic settings in which the communication between network nodes is affected by d...
详细信息
We consider a distributed non-convex optimization problem of minimizing the sum of all local cost functions over a network of agents. This problem often appears in large-scale distributed machine learning, known as no...
详细信息
We consider a distributed non-convex optimization problem of minimizing the sum of all local cost functions over a network of agents. This problem often appears in large-scale distributed machine learning, known as non-convex empirical risk minimization. In this paper, we propose two accelerated algorithms, named DSGT-HB and DSGT-NAG, which combine the distributed stochastic gradient tracking (DSGT) method with momentum accelerated techniques. Under appropriate assumptions, we prove that both algorithms sublinearly converge to a neighborhood of a first-order stationary point of the distributed non-convex optimization. Moreover, we derive the conditions under which DSGT-HB and DSGT-NAG achieve a network-independent linear speedup. Numerical experiments for a distributednon-convex logistic regression problem on real data sets and a deep neural network on the MNIST database show the superiorities of DSGT-HB and DSGT-NAG compared with DSGT.
This paper presents a distributedoptimization algorithm to solve the unconstrained (not necessarily convex) optimization problems in a multi-agent system. The algorithm does not require any explicit expressions on th...
详细信息
ISBN:
(纸本)9781728111643
This paper presents a distributedoptimization algorithm to solve the unconstrained (not necessarily convex) optimization problems in a multi-agent system. The algorithm does not require any explicit expressions on the local cost functions, which means gradient information is not available. The only required information is the local measurements of the local cost function from each agent. The idea of the algorithm is to search for a better optimal point until no better points can be found. Each step-update is based on an estimation of the distance between the possible better points from the current point. The estimation is generated from a learning model, which is trained by the historical data of the step-updates. This algorithm does not require any assumptions on the convexity and differentiability of the local and/or global functions except that the solution exists and is finite. Convergence property of the proposed algorithm is carefully studied. Based on the idea of exploitation and exploration, it is shown that the algorithm is able to escape from the local optimum to find the best known optimum. A numerical example is provided to verify the performance of the proposed algorithm.
暂无评论