检索结果-内蒙古大学图书馆

A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

Science China(Information Sciences) 2025年第5期68卷 53-71页

作者： Haochen TAO Shisheng CUI Zhuo LI Jian SUN School of Automation Beijing Institute of Technology National Key Laboratory of Autonomous Intelligent Unmanned Systems School of AutomationBeijing Institute of Technology Beijing Institute of Technology Chongqing Innovation Center

Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization(BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic(AC) policy optimization problems. In this work, based on a bilevelstructured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions,the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L20 L20?-1). The convergence performance of the proposed algorithm is verified through numerical simulations.

关键词： actor-critic bilevel optimization zeroth-order algorithm implicit programming stochastic approximation

来源：评论

学校读者我要写书评

暂无评论

Quantized zeroth-order Gradient Tracking algorithm for Distributed Nonconvex Optimization Under Polyak-Lojasiewicz Condition

引用

IEEE TRANSACTIONS ON CYBERNETICS 2024年第10期54卷 5746-5758页

作者： Xu, Lei Yi, Xinlei Deng, Chao Shi, Yang Chai, Tianyou Yang, Tao Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Univ Victoria Dept Mech Engn Victoria BC V8W 2Y2 Canada MIT Lab Informat & Decis Syst Cambridge MA 02139 USA Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing 210023 Peoples R China

This article focuses on distributed nonconvex optimization by exchanging information between agents to minimize the average of local nonconvex cost functions. The communication channel between agents is normally constrained by limited bandwidth, and the gradient information is typically unavailable. To overcome these limitations, we propose a quantized distributed zeroth-order algorithm, which integrates the deterministic gradient estimator, the standard uniform quantizer, and the distributed gradient tracking algorithm. We establish linear convergence to a global optimal point for the proposed algorithm by assuming Polyak-Lojasiewicz condition for the global cost function and smoothness condition for the local cost functions. Moreover, the proposed algorithm maintains linear convergence at low-data rates with a proper selection of algorithm parameters. Numerical simulations validate the theoretical results.

关键词： Cost function Standards Convergence Vectors Quantization (signal) Distributed algorithms Deep learning Gradient tracking algorithm linear convergence nonconvex optimization uniform quantizer zeroth-order algorithm

来源：评论

学校读者我要写书评

暂无评论

Linear Convergence of First- and zeroth-order Primal-Dual algorithms for Distributed Nonconvex Optimization

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2022年第8期67卷 4194-4201页

作者： Yi, Xinlei Zhang, Shengjun Yang, Tao Chai, Tianyou Johansson, Karl H. KTH Royal Inst Technol Sch Elect Engn & Comp Sci Div Decis & Control Syst S-11428 Stockholm Sweden Digital Futures S-11428 Stockholm Sweden Univ North Texas Dept Elect Engn Denton TX 76203 USA Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China

This article considers the distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of local cost functions by using local information exchange. We first consider a distributed first-order primal-dual algorithm. We show that it converges sublinearly to a stationary point if each local cost function is smooth and linearly to a global optimum under an additional condition that the global cost function satisfies the Polyak-Lojasiewicz condition. This condition is weaker than strong convexity, which is a standard condition for proving linear convergence of distributed optimization algorithms, and the global minimizer is not necessarily unique. Motivated by the situations where the gradients are unavailable, we then propose a distributed zeroth-order algorithm, derived from the considered first-order algorithm by using a deterministic gradient estimator, and show that it has the same convergence properties as the considered first-order algorithm under the same conditions. The theoretical results are illustrated by numerical simulations.

关键词： Convergence Cost function Convex functions Costs Technological innovation Lyapunov methods Laplace equations Distributed nonconvex optimization first-order algorithm linear convergence primal-dual algorithm zeroth-order algorithm

来源：评论

学校读者我要写书评

暂无评论

DERIVATIVE-FREE ALTERNATING PROJECTION algorithmS FOR GENERAL NONCONVEX-CONCAVE MINIMAX PROBLEMS

引用

SIAM JOURNAL ON OPTIMIZATION 2024年第2期34卷 1879-1908页

作者： Xu, Zi Wang, Ziqi Shen, Jingjing Dai, Yuhong Shanghai Univ Dept Math Shanghai 200444 Peoples R China Shanghai Univ Newtouch Ctr Math Shanghai 200444 Peoples R China Chinese Acad Sci LSEC ICMSEC Acad Math & Syst Sci Beijing 100190 Peoples R China

In this paper, we study zeroth -order algorithms for nonconvex-concave minimax problems, which have attracted much attention in machine learning, signal processing, and many other fields in recent years. We propose a zeroth -order alternating randomized gradient projection (ZO-AGP) algorithm for smooth nonconvex-concave minimax problems;its iteration complexity to obtain an \varepsilon -stationary point is bounded by O( \varepsilon - 4 ), and the number of function value estimates is bounded by O( d x + d y ) per iteration. Moreover, we propose a zeroth -order block alternating randomized proximal gradient algorithm (ZO-BAPG) for solving blockwise nonsmooth nonconvexconcave minimax optimization problems;its iteration complexity to obtain an \varepsilon -stationary point is bounded by O( \varepsilon - 4 ), and the number of function value estimates per iteration is bounded by O( Kd x + d y ). To the best of our knowledge, this is the first time zeroth -order algorithms with iteration complexity guarantee are developed for solving both general smooth and blockwise nonsmooth nonconvex-concave minimax problems. Numerical results on the data poisoning attack problem and the distributed nonconvex sparse principal component analysis problem validate the efficiency of the proposed algorithms.

关键词： nonconvex-concave minimax problem zeroth-order algorithm alternating random- ized gradient projection algorithm alternating randomized proximal gradient algorithm complexity analysis machine learning

来源：评论

学校读者我要写书评

暂无评论

zeroth-order single-loop algorithms for nonconvex-linear minimax problems

引用

JOURNAL OF GLOBAL OPTIMIZATION 2023年第2-4期87卷 551-580页

作者： Shen, Jingjing Wang, Ziqi Xu, Zi Shanghai Univ Dept Math Shanghai 200444 Peoples R China

Nonconvex minimax problems have attracted significant interest in machine learning and many other fields in recent years. In this paper, we propose a new zeroth-order alternating randomized gradient projection algorithm to solve smooth nonconvex-linear problems and its iteration complexity to find an e-first-order Nash equilibrium is O(epsilon(-3)) and the number of function value estimation per iteration is bounded by O(d(x)epsilon(-2)). Furthermore, we propose a zeroth-order alternating randomized proximal gradient algorithm for block-wise nonsmooth nonconvex-linear minimax problems and its corresponding iteration complexity is O(K-3/2 epsilon(-3)) and the number of function value estimation is bounded by O(d(x) epsilon(-2))per iteration. The numerical results indicate the efficiency of the proposed algorithms.

关键词： Nonconvex-linear minimax problem zeroth-order algorithm Alternating randomized gradient projection algorithm Alternating randomized proximal gradient algorithm Complexity analysis Machine learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：