检索结果-内蒙古大学图书馆

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Peeters, Maarten Verbeeck, Katja Nowe, Ann Vrije Univ Brussel Computat Modeling Lab Pleinlaan 2 B-1050 Brussels Belgium

ISBN: (纸本)9781424407064

learning Automata are shown to be an excellent tool for creating learning multi-agent systems. Most algorithms used in current automata research expect the environment to end in an explicit end-stage. In this end-stage the rewards are given to the learning automata (i.e. Monte Carlo updating). This is however unfeasible in sequential decision problems with infinite horizon where no such end-stage exists. In this paper we propose a new algorithm based on one-step returns that uses bootstrapping to find good equilibrium paths in multi-stage games.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

Policy Gradient adaptive Critic Design With dynamic Prioritized Experience Replay for Wastewater Treatment Process Control

引用

ieee TRANSACTIONS ON INDUSTRIAL INFORMATICS 2022年第5期18卷 3150-3158页

作者： Yang, Ruyue Wang, Ding Qiao, Junfei Beijing Univ Beijing Key Lab Computat Intelligence & Intellige Beijing Lab Smart Environm Protect Fac Informat Technol Beijing 100124 Peoples R China Beijing Univ Technol Beijing Inst Artificial Intelligence Beijing 100124 Peoples R China

With the industrialization of modern society, the pollution of water resources becomes more and more serious. Although purifying urban sewage through the wastewater treatment plants eases the burden of fragile ecosystems, the nonlinearities and uncertainties of biochemical reactions are difficult to address. In this article, a dynamic prioritized policy gradient adaptive dynamic programming (ADP) method is developed to solve the optimal control problem of nonaffine nonlinear discrete-time systems, along with convergence analysis of the algorithm. To the best of our knowledge, it is indispensable to conduct system modeling during the previous ADP research on wastewater treatment process control. By introducing the dynamic prioritized replay buffer and neural networks, the proposed ADP controller can track the setpoints of the wastewater treatment plant and alleviate the effects of disturbance without system modeling. The test results verify that the devised control method outperforms the proportional-integral-derivative strategy with less oscillation when unknown interference occurred.

关键词： Wastewater treatment Process control Optimal control Heuristic algorithms Mathematical model Wastewater dynamic programming adaptive dynamic programming (ADP) data-driven control experience replay reinforcement learning wastewater treatment

来源：评论

学校读者我要写书评

暂无评论

Balancing Value Iteration and Policy Iteration for Discrete-Time Control

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年第11期50卷 3948-3958页

作者： Luo, Biao Yang, Yin Wu, Huai-Ning Huang, Tingwen Cent South Univ Sch Automat Changsha 410083 Peoples R China Hamad Bin Khalifa Univ Coll Sci & Engn Doha Qatar Beihang Univ Sci & Technol Aircraft Control Lab Beijing 100191 Peoples R China Texas A&M Univ Qatar Dept Sci Doha Qatar

The optimal control problem of discrete-time nonlinear systems depends on the solution of the Bellman equation. In this paper, an adaptive reinforcement learning (RL) method is developed to solve the complex Bellman equation, which balances value iteration (VI) and policy iteration (PI). By adding a balance parameter, an adaptive RL integrates VI and PI together, which accelerates VI and avoids the need of an initial admissible control. The convergence of the adaptive RL is proved by showing that it converges to the Bellman equation. Subsequently, the adaptive RL is realized by using the neural network (NN) approximation for value function and a least-squares scheme is developed for updating NN weights. Then, the convergence of NN-based adaptive RL is proved with considering NN approximation error. To further improve its performance, an adaptive rule is developed for tuning balance parameter in adaptive RL iteration by iteration. Finally, the effectiveness of the adaptive RL is validated with simulation studies.

关键词： Convergence Optimal control Mathematical model Artificial neural networks adaptive systems Nonlinear systems reinforcement learning adaptive dynamic programming Bellman equation discrete-time neural network (NN) optimal control

来源：评论

学校读者我要写书评

暂无评论

A dynamic programming approach to viability problems

A dynamic programming approach to viability problems

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Coquelin, Pieffe-Amaud Martin, Sophie Munos, Reni Ecole Polytech Ctr Math Appl Palaiseau France Approximate Dynamic Programm Paris France

ISBN: (纸本)9781424407064

Viability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of he hi viability kernel, defined as the set of initial states from which am there exists a trajectory that remains in the set of constraints indefinitely. The theory is very elegant and appears naturally in many applications. Unfortunately, the current numerical approaches suffer from low computational efficiency, which limits the potential range of applications of this domain. In this paper we show that the viability kernel is the zero-level set of a related dynamic programming problem, which opens promising research directions for numerical approximation of the viability kernel using tools from approximate dynamic programming. We illustrate the approach using k-nearest neighbors on a toy problem in two dimensions and on a complex dynamical model for anaerobic digestion process in four dimensions.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Approximate dynamic programming for Stochastic Systems with Additive and Multiplicative Noise

Approximate Dynamic Programming for Stochastic Systems with ...

引用

ieee International symposium on Intelligent Control (ISIC)/ieee Multi-Conference on Systems and Control (MSC)

作者： Jiang, Yu Jiang, Zhong-Ping NYU Polytech Inst Dept Elect & Comp Engn Brooklyn NY 11201 USA

ISBN: (纸本)9781457711039

This paper studies the stochastic optimal control problem with additive and multiplicative noise via reinforcement learning (RL) and approximate/adaptive dynamic programming (ADP). Using Ito calculus, a policy iteration algorithm is derived in the presence of both additive and multiplicative noise. It is shown that the expectation of the approximated cost matrix is guaranteed to converge to the solution of certain algebraic Riccati equation that gives rise to the optimal cost value. Furthermore, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, the efficiency of the proposed ADP methodology is illustrated in a numerical example.

关键词： Additives Approximation algorithms Convergence Covariance matrix Noise Steady-state Symmetric matrices

来源：评论

学校读者我要写书评

暂无评论

learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Stochastic Disturbances

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第12期29卷 6202-6213页

作者： Xu, Xin Chen, Hong Lian, Chuanqiang Li, Dazi Natl Univ Def Technol Coll Intelligence Sci Changsha 410073 Hunan Peoples R China Jilin Univ NanLing State Key Lab Automot Simulat & Control Changchun 130025 Jilin Peoples R China Jilin Univ NanLing Dept Control Sci & Engn Changchun 130025 Jilin Peoples R China Naval Univ Engn Natl Key Lab Sci & Technol Vessel Integrated Powe Wuhan 430032 Hubei Peoples R China Beijing Univ Chem Technol Dept Automat Beijing 100029 Peoples R China

In this paper, a learning-based predictive control (LPC) scheme is proposed for adaptive optimal control of discrete-time nonlinear systems under stochastic disturbances. The proposed LPC scheme is different from conventional model predictive control (MPC), which uses open-loop optimization or simplified closed-loop optimal control techniques in each horizon. In LPC, the control task in each horizon is formulated as a closed-loop nonlinear optimal control problem and a finite-horizon iterative reinforcement learning (RL) algorithm is developed to obtain the closed-loop optimal/suboptimal solutions. Therefore, in LPC, RL and adaptive dynamic programming ( ADP) are used as a new class of closed-loop learning-based optimization techniques for nonlinear predictive control with stochastic disturbances. Moreover, LPC also decomposes the infinite-horizon optimal control problem in previous RL and ADP methods into a series of finite horizon problems, so that the computational costs are reduced and the learning efficiency can be improved. Convergence of the finite-horizon iterative RL algorithm in each prediction horizon and the Lyapunov stability of the closed-loop control system are proved. Moreover, by using successive policy updates between adjoint time horizons, LPC also has lower computational costs than conventional MPC which has independent optimization procedures between two different prediction horizons. Simulation results illustrate that compared with conventional nonlinear MPC as well as ADP, the proposed LPC scheme can obtain a better performance both in terms of policy optimality and computational efficiency.

关键词： adaptive dynamic programming (ADP) function approximation model predictive control (MPC) optimal control receding horizon reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Fuzzy Control Based on reinforcement learning and Subsystem Error Derivatives for Strict-Feedback Systems With an Observer

引用

ieee TRANSACTIONS ON FUZZY SYSTEMS 2023年第8期31卷 2509-2521页

作者： Li, Dongdong Dong, Jiuxiang Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Peoples R China Northeastern Univ Key Lab Vibrat & Control Aeroprop Syst Minist Educ China Shenyang 110819 Peoples R China Northeastern Univ Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China

In this article, a novel optimized fuzzy adaptive control method based on tracking error derivatives of subsystems is proposed for strict-feedback systems with unmeasurable states. A cost function based on the tracking error derivative is used. It not only solves the problem that the traditional input quadratic cost function at the infinite time is unbounded, but also solves the problem that the optimal control input derived from the cost function with exponential discount factor cannot make the error asymptotically stable. Considering the case where the states are unmeasurable, a fuzzy state observer is designed that removes the restriction of the Hurwitz equation for the gain parameters. Based on reinforcement learning, the observer, and error derivative cost function, an improved optimized backstepping control method is given. Using observed information and actor-critic structure to train fuzzy logic systems online, the control inputs are obtained to achieve approximate optimal control. Finally, all closed-loop signals are proved to be bounded by the Lyapunov method, and the effectiveness and advantages of the proposed algorithm are verified through two examples.

关键词： adaptive dynamic programming (ADP) fuzzy adaptive control fuzzy logic systems (FLSs) fuzzy state observer optimized backstepping control (OBC) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Common framework of certain reinforcement schedules

Common framework of certain reinforcement schedules

引用

2nd ieee World Congress on Computational Intelligence (WCCI 98)

作者： Pacut, A Warsaw Univ Technol Fac Elect & Informat Technol PL-00665 Warsaw Poland

ISBN: (纸本)0780348605

In the paper we investigate the reinforcement algorithms in a context of feedforward networks with gradient learning which use the smoothed output gradient estimators. The reduced network is introduced to avoid the output redundancy. The adaptive critic element can be viewed as a network with smoothed output gradients, and the associative search element as the reduced network with smoothed output gradients. In this context, the adaptive critic element becomes the regular member of the family of adaptive critic designs.

关键词： reinforcement learning adaptive critic neural networks dynamic programming adaptive control

来源：评论

学校读者我要写书评

暂无评论

A Retrospective on adaptive dynamic programming for Control

A Retrospective on Adaptive Dynamic Programming for Control

引用

International Joint Conference on Neural Networks

作者： Lendaris, George G. Portland State Univ Syst Sci Grad Program Portland OR 97201 USA

ISBN: (纸本)9781424435494

Some three decades ago, certain computational intelligence methods of reinforcement learning were recognized as implementing an approximation of Bellman's dynamic programming method, which is known in the controls community as an important tool for designing optimal control policies for nonlinear plants and sequential decision making. Significant theoretical and practical developments have occurred within this arena, mostly in the past decade, with the methodology now usually referred to as adaptive dynamic programming (ADP). The objective of this paper is to provide a retrospective of selected threads of such developments. In addition, a commentary is offered concerning present status of ADP, and threads for future research and development within the controls field are suggested.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A scalable model-free recurrent neural network framework for solving POMDPs

A scalable model-free recurrent neural network framework for...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Liu, Zhenzhen Elhanany, Itamar Univ Tennessee Dept Elect & Comp Engn Knoxville TN 37996 USA

ISBN: (纸本)9781424407064

This paper presents a framework for obtaining an optimal policy in model-free Partially Observable Markov Decision Problems (POMDPs) using a recurrent neural network (RNN). A Q-function approximation approach is taken, utilizing a novel RNN architecture with computation and storage requirements that are dramatically reduced when compared to existing schemes. A scalable online training algorithm, derived from the real-time recurrent learning (RTRL) algorithm, is employed. Moreover, stochastic meta-descent (SMD), an adaptive step size scheme for stochastic gradient-descent problems, is utilized as means of incorporating curvature information to accelerate the learning process. We consider case studies of POMDPs where state information is not directly available to the agent. Particularly, we investigate scenarios in which the agent receives indentical observations for multiple states, thereby relying on temporal dependencies captured by the RNN to obtain the optimal policy. Simulation results illustrate the effectiveness of the approach along with substantial improvement in convergence rate when compared to existing schemes. Index Terms-Recurrent neural networks, real-time recurrent learning (RTRL), constraint optimization.

关键词： Constraint optimization Real-lime recurrent learning (RTRL) Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：