Stepsize plays an important role in the stochastic gradient method. The bandwidth-based stepsize allows us to adjust the stepsize within a banded region determined by some boundary functions. Based on the bandwidth-ba...
详细信息
Stepsize plays an important role in the stochastic gradient method. The bandwidth-based stepsize allows us to adjust the stepsize within a banded region determined by some boundary functions. Based on the bandwidth-based stepsize, we propose a new method, namely SCSG-BD, for smooth non-convex finite-sum optimization problems. For the boundary functions 1/t, 1/(tlog(t+1))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/(t\log (t + 1))$$\end{document} and 1/tp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/t<^>p$$\end{document} (p is an element of(0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p\in (0,1)$$\end{document}), SCSG-BD converges sublinearly to a stationary point at a faster rate than the stochastically controlled stochastic gradient (SCSG) method under certain conditions. Moreover, SCSG-BD is able to converge linearly to the solution if the objective function satisfies the Polyak-& Lstrok;ojasiewicz condition. We also introduce the 1/t-Barzilai-Borwein stepsize for practical computation. Numerical experiments demonstrate that SCSG-BD performs better than SCSG and its variants.
The probabilistic gradient estimator (PAGE) algorithm allows switching between vanilla SGD and variance-reduced methods in a flexible probabilistic manner. This motivates us to develop novel momentum-based algorithms ...
详细信息
The probabilistic gradient estimator (PAGE) algorithm allows switching between vanilla SGD and variance-reduced methods in a flexible probabilistic manner. This motivates us to develop novel momentum-based algorithms for non-convexfinite-sum problems. Specifically, we replace SGD with momentum acceleration in PAGE, and the momentum term is integrated in the inner and outer parts of the gradient estimator, named mPAGE-l and mPAGE-O, respectively. Furthermore, we propose a unified algorithmic framework for momentum variants to cover mPAGE-I and mPAGE-O, denoted as mPAGE. For non-convex objectives, we establish a unified analysis of mPAGE and show that the mPAGE algorithms sublinearly converge to an $\epsilon$-accurate solution with a high probability. By choosing an appropriate probability, the well-known stochastic gradient complexity bound is also achieved. Additionally, mPAGE can degenerate into widely used optimization algorithms, such as SGD, SHB, and PAGE, demonstrating the generality and effectiveness of our theoretical analysis. Finally, numerical experiments validate the benefits of mPAGE methods and support our theoretical findings.
This article proposes a distributed stochastic algorithm with variance reduction for general smooth non-convex finite-sum optimization, which has wide applications in signal processing and machine learning communities...
详细信息
This article proposes a distributed stochastic algorithm with variance reduction for general smooth non-convex finite-sum optimization, which has wide applications in signal processing and machine learning communities. In distributed setting, a large number of samples are allocated to multiple agents in the network. Each agent computes local stochastic gradient and communicates with its neighbors to seek for the global optimum. In this article, we develop a modified variance reduction technique to deal with the variance introduced by stochastic gradients. Combining gradient tracking and variance reduction techniques, this article proposes a distributed stochastic algorithm, gradient tracking algorithm with variance reduction (GT-VR), to solve large-scale non-convex finite-sum optimization over multiagent networks. A complete and rigorous proof shows that the GT-VR algorithm converges to the first-order stationary points with O(1/k) convergence rate. In addition, we provide the complexity analysis of the proposed algorithm. Compared with some existing first-order methods, the proposed algorithm has a lower O(PM epsilon b;(1)) gradient complexity under some mild condition. By comparing state-of-the-art algorithms and GT-VR in numerical simulations, we verify the efficiency of the proposed algorithm.
暂无评论