检索结果-内蒙古大学图书馆

oracle complexity in Nonsmooth Nonconvex Optimization

JOURNAL OF MACHINE LEARNING RESEARCH 2022年第1期23卷 1-44页

作者： Kornowski, Guy Shamir, Ohad Weizmann Inst Sci Dept Comp Sci & Appl Math Rehovot Israel

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex func-tion, standard gradient-based methods can find e-stationary points (with gradient norm less than e) in O(1/e2) iterations. However, many important nonconvex optimization prob-lems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results: First, we consider the problem of getting near e-stationary points. This is perhaps the most natural relaxation of finding e-stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and e smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove under a mild assumption an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a non -smo oth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing meth-ods, but only by making the oracle complexity of the smoothing process exponentially large.

关键词： nonconvex nonsmooth optimization optimization theory oracle complexity smoothing

来源：评论

学校读者我要写书评

暂无评论

Gradient Descent is Pareto-Optimal in the oracle complexity and Memory Tradeoff for Feasibility Problems 65

Gradient Descent is Pareto-Optimal in the Oracle Complexity ...

引用

65th Symposium on Foundations of Computer Science

作者： Blanchard, Moise MIT Cambridge MA 02139 USA

ISBN: (纸本)9798331516758;9798331516741

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit d-dimensional ball and contains a ball of known radius epsilon > 0. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy c >= e(-do(1)), any deterministic algorithm either uses d(1+delta) bits of memory or must make at least 1/(d(0.01 delta)epsilon(2) 1-delta/1+1.01 delta -o(1)) oracle queries, for any delta is an element of [0, 1]. Additionally, we show that randomized algorithms either use d1+d memory or make at least 1/(d(2 delta)c(2(1-4 delta)-o(1))) queries for any delta is an element of [0, 1/4]. Because gradient descent only uses linear memory O(d ln 1/epsilon) but makes O(1/epsilon(2)) queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in 1/epsilon if the algorithm has less than quadratic memory in d. This reveals a sharp phase transition since with quadratic O(d(2) ln 1/epsilon) memory, cutting plane methods only require O(d ln 1/epsilon) queries.

关键词： convex optimization feasibility problem first-order optimization memory usage oracle complexity

来源：评论

学校读者我要写书评

暂无评论

oracle complexity in nonsmooth nonconvex optimization

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2022年第1期23卷 14161-14204页

作者： Guy Kornowski Ohad Shamir Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot Israel

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find ε-stationary points (with gradient norm less than ε) in O(1/ε2) iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results: First, we consider the problem of getting near ε-stationary points. This is perhaps the most natural relaxation of finding ε-stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and ε smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove under a mild assumption an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing methods, but only by making the oracle complexity of the smoothing process exponentially large.

关键词： nonconvex nonsmooth optimization optimization theory oracle complexity smoothing

来源：评论

学校读者我要写书评

暂无评论

On the oracle complexity of smooth strongly convex minimization

引用

JOURNAL OF complexity 2022年 68卷

作者： Drori, Yoel Taylor, Adrien Google Res Jerusalem Israel PSL Res Univ CNRS Dept Informat ENS INRIAEcole Normale Super Paris France

We construct a family of functions suitable for establishing lower bounds on the oracle complexity of first-order minimization of smooth strongly-convex functions. Based on this construction, we derive new lower bounds on the complexity of strongly-convex minimization under various inaccuracy criteria. The new bounds match the known upper bounds up to a constant factor, and when the inaccuracy of a solution is measured by its distance to the solution set, the new lower bound exactly matches the upper bound obtained by the recent Information-Theoretic Exact Method by the same authors, thereby establishing the exact oracle complexity for this class of problems. (C) 2021 The Author(s). Published by Elsevier Inc.

关键词： Strongly convex functions oracle complexity Information-based complexity Optimal methods

来源：评论

学校读者我要写书评

暂无评论

oracle complexity of second-order methods for smooth convex optimization

引用

MATHEMATICAL PROGRAMMING 2019年第1-2期178卷 327-360页

作者： Arjevani, Yossi Shamir, Ohad Shiff, Ron Weizmann Inst Sci Dept Comp Sci Rehovot Israel

Second-order methods, which utilize gradients as well as Hessians to optimize a given function, are of major importance in mathematical optimization. In this work, we prove tight bounds on the oracle complexity of such methods for smooth convex functions, or equivalently, the worst-case number of iterations required to optimize such functions to a given accuracy. In particular, these bounds indicate when such methods can or cannot improve on gradient-based methods, whose oracle complexity is much better understood. We also provide generalizations of our results to higher-order methods.

关键词： Smooth convex optimization oracle complexity

来源：评论

学校读者我要写书评

暂无评论

Lower Bounds on the oracle complexity of Nonsmooth Convex Optimization via Information Theory

引用

IEEE TRANSACTIONS ON INFORMATION THEORY 2017年第7期63卷 4709-4724页

作者： Braun, Gabor Guzman, Cristobal Pokutta, Sebastian Georgia Inst Technol Dept Ind & Syst Engn Atlanta GA 30332 USA Pontificia Univ Catolica Chile Fac Matemat Santiago 7820436 Chile Pontificia Univ Catolica Chile Escuela Ingn Santiago 7820436 Chile

We present an information-theoretic approach to lower bound the oracle complexity of nonsmooth black box convex optimization, unifying previous lower bounding techniques by identifying a combinatorial problem, namely string guessing, as a single source of hardness. As a measure of complexity, we use distributional oracle complexity, which subsumes randomized oracle complexity as well as worst case oracle complexity. We obtain strong lower bounds on distributional oracle complexity for the box [-1, 1](n), as well as for the L-p-ball for p = 1 (for both low-scale and large-scale regimes), matching worst case upper bounds, and hence we close the gap between distributional complexity, and in particular, randomized complexity and worst case complexity. Furthermore, the bounds remain essentially the same for high-probability and bounded-error oracle complexity, and even for combination of the two, i.e., bounded-error high-probability oracle complexity. This considerably extends the applicability of known bounds.

关键词： Convex optimization oracle complexity lower complexity bounds randomized algorithms distributional and high-probability lower bounds

来源：评论

学校读者我要写书评

暂无评论

Zeroth-Order Random Subspace Algorithm for Non-smooth Convex Optimization

引用

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS 2025年第3期204卷 1-31页

作者： Nozawa, Ryota Poirion, Pierre-Louis Takeda, Akiko Univ Tokyo Dept Math Informat Tokyo Japan RIKEN Ctr Adv Intelligence Project Tokyo Japan

Zeroth-order optimization, which does not use derivative information, is one of the significant research areas in the field of mathematical optimization and machine learning. Although various studies have explored zeroth-order algorithms, one of the theoretical limitations is that oracle complexity depends on the dimension, i.e., on the number of variables, of the optimization problem. In this paper, to reduce the dependency of the dimension in oracle complexity, we propose a zeroth-order random subspace algorithm by combining a gradient-free algorithm (specifically, Gaussian randomized smoothing with central differences) with random projection. We derive the worst-case oracle complexity of our proposed method in non-smooth and convex settings;it is equivalent to standard results for full-dimensional non-smooth convex algorithms. Furthermore, we prove that ours also has a local convergence rate independent of the original dimension under additional assumptions. In addition to the theoretical results, numerical experiments show that when an objective function has a specific structure, the proposed method can become experimentally more efficient due to random projection.

关键词： Zeroth-order optimization Random projection Convex optimization oracle complexity

来源：评论

学校读者我要写书评

暂无评论

Zeroth-order algorithms for nonconvex-strongly-concave minimax problems with improved complexities

引用

JOURNAL OF GLOBAL OPTIMIZATION 2023年第2-4期87卷 709-740页

作者： Wang, Zhongruo Balasubramanian, Krishnakumar Ma, Shiqian Razaviyayn, Meisam Univ Calif Davis Dept Math Davis CA 95616 USA Univ Calif Davis Dept Stat Davis CA USA Univ Southern Calif Dept Ind & Syst Engn Los Angeles CA USA

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the Zeroth-Order Gradient Descent Ascent (ZO-GDA) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the Zeroth-Order Gradient Descent Multi-Step Ascent (ZO-GDMSA) algorithm that significantly improves the oracle complexity of ZO-GDA. We then consider stochastic versions of ZO-GDA and ZO-GDMSA, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parameterized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.

关键词： Minimax problem Zeroth-order algorithms oracle complexity Gradient descent ascent Stochastic algorithms

来源：评论

学校读者我要写书评

暂无评论

Low-Rank Gradient Descent

IEEE OPEN JOURNAL OF CONTROL SYSTEMS

引用

IEEE OPEN JOURNAL OF CONTROL SYSTEMS 2023年 2卷 380-395页

作者： Cosson, Romain Jadbabaie, Ali Makur, Anuran Reisizadeh, Amirhossein Shah, Devavrat Natl Inst Res Digital Sci & Technol F-75006 Paris France MIT Lab Informat & Decis Syst Cambridge MA 02139 USA Purdue Univ Dept Comp Sci W Lafayette IN USA Purdue Univ Sch Elect & Comp Engn W Lafayette IN USA

Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed Low-Rank Gradient Descent (LRGD) algorithm finds an & varepsilon;-approximate stationary point of a p-dimensional function by first identifying r <= p significant directions, and then estimating the true p-dimensional gradient at every iteration by computing directional derivatives only along those r directions. We establish that the "directional oracle complexities" of LRGD for strongly convex and non-convex objective functions are O(r log(1/& varepsilon;) + rp) and O(r/& varepsilon;(2 )+ rp) , respectively. Therefore, when r << p , LRGD provides significant improvement over the known complexities of O(p log(1/& varepsilon;)) and O(p/& varepsilon;(2)) of GD in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, LRGD provides significant gains over GD when the data has low-rank structure, and in the absence of such structure, LRGD does not degrade performance compared to GD . This suggests that LRGD could be used in practice in any setting in place of GD .

关键词： Active subspace first order optimization low-rank functions oracle complexity Active subspace first order optimization low-rank functions oracle complexity

来源：评论

学校读者我要写书评

暂无评论

A hybrid stochastic optimization framework for composite nonconvex optimization

引用

MATHEMATICAL PROGRAMMING 2022年第2期191卷 1005-1071页

作者： Quoc Tran-Dinh Pham, Nhan H. Phan, Dzung T. Nguyen, Lam M. Univ North Carolina Dept Stat & Operat Res 318 Hanes Hall Chapel Hill NC 27599 USA IBM Res Thomas J Watson Res Ctr Yorktown Hts NY 10598 USA

We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine a variance-reduced estimator and an unbiased stochastic one to create a new hybrid estimator which trades-off the variance and bias, and possesses useful properties for developing new algorithms. We first introduce our hybrid estimator and investigate its fundamental properties to form a foundational theory for algorithmic development. Next, we apply our new estimator to develop several variants of stochastic gradient method to solve both expectation and finite-sum composite optimization problems. Our first algorithm can be viewed as a variant of proximal stochastic gradient methods with a single loop and single sample, but can achieve the best-known oracle complexity bound as state-of-the-art double-loop algorithms in the literature. Then, we consider two different variants of our method: adaptive step-size and restarting schemes that have similar theoretical guarantees as in our first algorithm. We also study two mini-batch variants of the proposed methods. In all cases, we achieve the best-known complexity bounds under standard assumptions. We test our algorithms on several numerical examples with real datasets and compare them with many existing methods. Our numerical experiments show that the new algorithms are comparable and, in many cases, outperform their competitors.

关键词： Hybrid stochastic estimator Stochastic optimization algorithm oracle complexity Variance reduction Composite nonconvex optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：