检索结果-内蒙古大学图书馆

Approximate dynamic programming strategies and their applicability for process control: A review and future directions

引用

INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS 2004年第3期2卷 263-278页

作者： Lee, JM Lee, JH Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper reviews dynamic programming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and neuro-dynamic programming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques for dynamic programming and identify those most suitable for process control problems. Several open issues are also identified and discussed.

关键词： approximate dynamic programming reinforcement learning neuro-dynamic programming optimal control function approximation

来源：评论

学校读者我要写书评

暂无评论

Generalized Policy Iteration Adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2015年第12期45卷 1577-1591页

作者： Liu, Derong Wei, Qinglai Yan, Pengfei Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptive dynamic programming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum. Initialized by an admissible control law, it is shown that the iterative value functions are monotonically nonincreasing and converge to the optimal solution of Hamilton-Jacobi-Bellman equation, under the assumption that a perfect function approximation is employed. The admissibility property is analyzed, which shows that any of the iterative control laws can stabilize the nonlinear system. Neural networks are utilized to implement the generalized policy iteration algorithm, by approximating the iterative value function and computing the iterative control law, respectively, to achieve approximate optimal control. Finally, numerical examples are presented to verify the effectiveness of the present generalized policy iteration algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming generalized policy iteration neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Optimal Control via Local Policy Iteration Adaptive dynamic programming

引用

IEEE TRANSACTIONS ON CYBERNETICS 2017年第10期47卷 3367-3379页

作者： Wei, Qinglai Liu, Derong Lin, Qiao Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming local policy iteration neuro-dynamic programming nonlinear systems optimal control

来源：评论

学校读者我要写书评

暂无评论

What You Should Know About Approximate dynamic programming

引用

NAVAL RESEARCH LOGISTICS 2009年第3期56卷 239-249页

作者： Powell, Warren B. Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

Approximate dynamic programming (ADP) is a broad umbrella for a modeling and algorithmic strategy for solving problems that are sometimes large and complex, and are usually (but not always) stochastic. It is most often presented as a method for overcoming the classic curse of dimensionality that is well-known to plague the use of Bellman's equation. For many problems, there are actually up to three curses of dimensionality. But the richer message of approximate dynamic programming is learning what to learn, and how to learn it, to make better decisions over time. This article provides a brief review of approximate dynamic programming, without intending to be a complete tutorial. Instead, our goal is to provide a broader perspective of ADP and how it should be approached from the perspective of different problem classes. (C) 2009 Wiley Periodicals, Inc. Naval Research Logistics 56: 239-249,2009

关键词： approximate dynamic programming reinforcement learning neuro-dynamic programming stochastic optimization Monte Carlo simulation

来源：评论

学校读者我要写书评

暂无评论

A Novel Iterative θ-Adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第4期11卷 1176-1190页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new iterative theta-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neural networks neuro-dynamic programming nonlinear systems optimal control policy iteration reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Two-Player Zero-Sum Games for Nonlinear Systems Using Iterative Adaptive dynamic programming 13th

Discrete-Time Two-Player Zero-Sum Games for Nonlinear System...

引用

13th International Symposium on Neural Networks (ISNN)

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

ISBN: (纸本)9783319406633;9783319406626

This paper is concerned with a discrete-time two-player zero-sum game of nonlinear systems, which is solved by a new iterative adaptive dynamic programming (ADP) method. In the present iterative ADP algorithm, two iteration procedures, which are upper and lower iterations, are implemented to obtain the upper and lower performance index functions, respectively. Initialized by an arbitrary positive semi-definite function, it is shown that the iterative value functions converge to the optimal performance index function if the optimal performance index function of the two-player zero-sum game exists. Finally, simulation results are given to illustrate the performance of the developed method.

关键词： Adaptive critic designs Adaptive dynamic programming Approximate dynamic programming neuro-dynamic programming Zero-sum game Optimal control

来源：评论

学校读者我要写书评

暂无评论

Learning Nursery Rhymes Using Adaptive Parameter neurodynamic programming 1

Learning Nursery Rhymes Using Adaptive Parameter Neurodynami...

引用

1st Australasian Conference on Artificial Life and Computational Intelligence (ACALCI)

作者： Walker, Josiah Chalup, Stephan K. Univ Newcastle Sch Elect Engn & Comp Sci Callaghan NSW 2308 Australia

ISBN: (纸本)9783319148038;9783319148021

In this study on music learning, we develop an average reward based adaptive parameterisation for reinforcement learning meta-parameters. These are tested using an approximation of user feedback based on the goal of learning the nursery rhymes Twinkle Twinkle Little Star and Mary Had a Little Lamb. We show that a large reduction in learning times can be achieved through a combination of adaptive parameters and random restarts to ensure policy convergence.

关键词： neuro-dynamic programming reinforcement learning meta-parameters online music learning

来源：评论

学校读者我要写书评

暂无评论

Optimal Learning Control for Discrete-Time Nonlinear Systems Using Generalized Policy Iteration Based Adaptive dynamic programming 11

Optimal Learning Control for Discrete-Time Nonlinear Systems...

引用

11th World Congress on Intelligent Control and Automation

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

ISBN: (纸本)9781479958252

In this paper, a novel generalized policy iteration algorithm is investigated to solve infinite horizon optimal control problems for discrete-time nonlinear systems. Two iteration indices are introduced in the generalized policy iteration algorithm, which iterate for policy improvement and policy evaluation, respectively. For the first time the properties of monotonicity, convergence and admissibility for the generalized policy iteration algorithm are analyzed to guarantee that the iterative performance index function converges to the optimum and the iterative control law stabilizes the control system. Finally, numerical results are presented to illustrate the performance of the developed method.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming generalized policy iteration nonlinear systems optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Multiagent Reinforcement Learning:Rollout and Policy Iteration

引用

IEEE/CAA Journal of Automatica Sinica 2021年第2期8卷 249-272页

作者： Dimitri Bertsekas the Arizona State University(ASU) TempeAZ 85281 USAand also with Massachusetts Institute of Technology(MIT)CambridgeMA 02139

We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration(PI),i.e.,start from some base policy and generate an improved *** is the simplest method of this type,where just one improved policy is *** can view PI as repeated application of rollout,where the rollout policy at each iteration serves as the base policy for the next *** contrast with PI,rollout has a robustness property:it can be applied on-line and is suitable for on-line ***,rollout can use as base policy one of the policies produced by PI,thereby improving on that *** is the type of scheme underlying the prominently successful Alpha Zero chess *** this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected(conceptually)by a separate *** is the class of multiagent problems where the agents have a shared objective function,and a shared and perfect state *** on a problem reformulation that trades off control space complexity with state space complexity,we develop an approach,whereby at every stage,the agents sequentially(one-at-a-time)execute a local rollout algorithm that uses a base policy,together with some coordinating information from the other *** amount of total computation required at every stage grows linearly with the number of *** contrast,in the standard rollout algorithm,the amount of total computation grows exponentially with the number of *** the dramatic reduction in required computation,we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout:it guarantees an improved performance relative to the base *** also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information,which is sufficient to maintain the c

关键词： dynamic programming multiagent problems neuro-dynamic programming policy iteration reinforcement learning rollout

来源：评论

学校读者我要写书评

暂无评论

Simulation based strategy for nonlinear optimal control: application to a microbial cell reactor

引用

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 2003年第3-4期13卷 347-363页

作者： Kaisare, NS Lee, JM Lee, JH Georgia Inst Technol Sch Chem Engn Atlanta GA 30332 USA

Optimal control of systems with complex nonlinear behaviour such as steady state multiplicity results in a nonlinear optimization problem that needs to be solved online at each sample time. We present an approach based on simulation, function approximation and evolutionary improvement aimed towards simplifying online optimization. Closed loop data from a suboptimal control law, such as MPC based on successive linearization, are used to obtain an approximation of the 'cost-to-go' function, which is subsequently improved through iterations of the Bellman equation. Using this offline-computed cost approximation, an infinite horizon problem is converted to an equivalent single stage problem-substantially reducing the computational burden. This approach is tested on continuous culture of microbes growing on a nutrient medium containing two substrates that exhibits steady state multiplicity. Extrapolation of the cost-to-go function approximator can lead to deterioration of online performance. Some remedies to prevent such problems caused by extrapolation are proposed. Copyright (C) 2003 John Wiley Sons, Ltd.

关键词： optimal control continuous bioreactor multiple steady states neuro-dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：