In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted signif...
详细信息
In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the zeroth-order Gradient Descent Ascent (ZO-GDA) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the zeroth-order Gradient Descent Multi-Step Ascent (ZO-GDMSA) algorithm that significantly improves the oracle complexity of ZO-GDA. We then consider stochastic versions of ZO-GDA and ZO-GDMSA, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parameterized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.
We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this class of problems converge slowly ...
详细信息
We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this class of problems converge slowly in practice, involve subproblems that can be as difficult as the original problem, or lack rigorous convergence guarantees. In this paper, we propose a manifold proximal gradient method (ManPG) for solving this class of problems. We prove that the proposed method converges globally to a stationary point and establish its iteration complexity for obtaining an \epsilon -stationary point. Furthermore, we present numerical results on the sparse PCA and compressed modes problems to demonstrate the advantages of the proposed method. We also discuss some recent advances related to ManPG for Riemannian optimization with nonsmooth objective functions.
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order methods and compressed gradient methods. For such methods to converge toward a global optimum, it is intuitive to use l...
详细信息
ISBN:
(纸本)9781728176055
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order methods and compressed gradient methods. For such methods to converge toward a global optimum, it is intuitive to use large step-sizes in the initial iterations when the noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient methods, but similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient methods. Unlike previous works, our analysis framework shows that such step-size schedules enable these methods to enjoy an optimal O(1/k) rate. We exemplify our results on zeroth-order methods and stochastic compression methods. Our experiments validate fast convergence of these methods with the step decay schedules.
暂无评论