检索结果-内蒙古大学图书馆

zeroth-order algorithms for nonconvex-strongly-concave minimax problems with improved complexities

JOURNAL OF GLOBAL OPTIMIZATION 2023年第2-4期87卷 709-740页

作者： Wang, Zhongruo Balasubramanian, Krishnakumar Ma, Shiqian Razaviyayn, Meisam Univ Calif Davis Dept Math Davis CA 95616 USA Univ Calif Davis Dept Stat Davis CA USA Univ Southern Calif Dept Ind & Syst Engn Los Angeles CA USA

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the zeroth-order Gradient Descent Ascent (ZO-GDA) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the zeroth-order Gradient Descent Multi-Step Ascent (ZO-GDMSA) algorithm that significantly improves the oracle complexity of ZO-GDA. We then consider stochastic versions of ZO-GDA and ZO-GDMSA, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parameterized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.

关键词： Minimax problem zeroth-order algorithms Oracle complexity Gradient descent ascent Stochastic algorithms

来源：评论

学校读者我要写书评

暂无评论

Nonsmooth Optimization over the Stiefel Manifold and Beyond: Proximal Gradient Method and Recent Variants

引用

SIAM REVIEW 2024年第2期66卷 319-352页

作者： Chen, Shixiang Ma, Shiqian So, Anthony Man-Cho Zhang, Tong Univ Sci & Technol China Sch Math Sci Hefei Anhui Peoples R China Rice Univ Dept Computat Appl Math & Operat Res Houston TX 77005 USA Chinese Univ Hong Kong Dept Syst Engn & Engn Management Shatin Hong Kong Peoples R China Hong Kong Univ Sci & Technol Clear Water Bay Hong Kong Peoples R China

We consider optimization problems over the Stiefel manifold whose objective function is the summation of a smooth function and a nonsmooth function. Existing methods for solving this class of problems converge slowly in practice, involve subproblems that can be as difficult as the original problem, or lack rigorous convergence guarantees. In this paper, we propose a manifold proximal gradient method (ManPG) for solving this class of problems. We prove that the proposed method converges globally to a stationary point and establish its iteration complexity for obtaining an \epsilon -stationary point. Furthermore, we present numerical results on the sparse PCA and compressed modes problems to demonstrate the advantages of the proposed method. We also discuss some recent advances related to ManPG for Riemannian optimization with nonsmooth objective functions.

关键词： manifold optimization Stiefel manifold nonsmooth proximal gradient method iteration complexity semismooth Newton method stochastic algorithms zeroth-order algorithms

来源：评论

学校读者我要写书评

暂无评论

IMPROVED STEP-SIZE SCHEDULES FOR NOISY GRADIENT METHODS

IMPROVED STEP-SIZE SCHEDULES FOR NOISY GRADIENT METHODS

引用

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Khirirat, Sarit Wang, Xiaoyu Magnusson, Sindri Johansson, Mikael KTH Royal Inst Technol Stockholm Sweden Stockholm Univ Stockholm Sweden

ISBN: (纸本)9781728176055

Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order methods and compressed gradient methods. For such methods to converge toward a global optimum, it is intuitive to use large step-sizes in the initial iterations when the noise is typically small compared to the algorithm-steps, and reduce the step-sizes as the algorithm progresses. This intuition has been confirmed in theory and practice for stochastic gradient methods, but similar results are lacking for other methods using approximate gradients. This paper shows that the diminishing step-size strategies can indeed be applied for a broad class of noisy gradient methods. Unlike previous works, our analysis framework shows that such step-size schedules enable these methods to enjoy an optimal O(1/k) rate. We exemplify our results on zeroth-order methods and stochastic compression methods. Our experiments validate fast convergence of these methods with the step decay schedules.

关键词： Optimization machine learning distributed algorithms zeroth-order algorithms quantization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：