检索结果-内蒙古大学图书馆

Finite-Approximation-Error-Based Discrete-Time Iterative adaptive dynamic programming

ieee TRANSACTIONS ON CYBERNETICS 2014年第12期44卷 2820-2833页

作者： Wei, Qinglai Wang, Fei-Yue Liu, Derong Yang, Xiong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming approximation error neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Exploring the Relationship of Reward and Punishment in reinforcement learning Evolving Action Meta-learning Functions in Goal Navigation

Exploring the Relationship of Reward and Punishment in Reinf...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Lowe, Robert Ziemke, Tom Univ Skovde Interact Lab Skovde Sweden

ISBN: (纸本)9781467359252

We present a reinforcement learning algorithm based on Dyna-Sarsa that utilizes separate representations of reward and punishment when guiding state-action value learning and action selection. The adoption of policy meta-learning optimized by a genetic algorithm is explored and results in the context of a two-armed bandit goal-navigation task in a simple grid world are presented. The findings argue for an important role for a genetic algorithm approach for constructing the foundations of autonomous reinforcement learning agents.

关键词： Value Reward Punishment reinforcement Contingencies SARSA TD learning Genetic Algorithm

来源：评论

学校读者我要写书评

暂无评论

Exponential Moving Average Q-learning Algorithm

Exponential Moving Average Q-Learning Algorithm

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Awheda, Mostafa D. Schwartz, Howard M. Carleton Univ Dept Syst & Comp Engn Ottawa ON K1S 5B6 Canada

ISBN: (纸本)9781467359252

A multi-agent policy iteration learning algorithm is proposed in this work. The Exponential Moving Average (EMA) mechanism is used to update the policy for a Q-learning agent so that it converges to an optimal policy against the policies of the other agents. The proposed EMA Q-learning algorithm is examined on a variety of matrix and stochastic games. Simulation results show that the proposed algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning (MARL) algorithms.

关键词： Policies Moving average optimal strategy mucin 1 European Monetary Agreement European Medicines Agency Exponential algorithms Internet learning Agent (ILA) stochastic games Agents

来源：评论

学校读者我要写书评

暂无评论

Optimal control for a class of nonlinear systems with state delay based on adaptive dynamic programming with ε-error bound

Optimal control for a class of nonlinear systems with state ...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Lin, Xiaofeng Cao, Nuyun Lin, Yuzhang Guangxi Univ Sch Elect Engn Nanning 530004 Peoples R China Tsinghua Univ Dept Elect Engn Beijing Peoples R China

ISBN: (纸本)9781467359252

In this paper, a finite-horizon epsilon-optimal control for a class of nonlinear systems with state delay is proposed by adaptive dynamic programming (ADP) algorithm. First of all, the performance index function is defined and the Hamilton-Jacobi-Bellman (HJB) equation is obtained for the problem, the convergence of the iterative algorithm is also presented. Then, ADP algorithm for finite-horizon optimal control is introduced with an epsilon-error bound so as to get the epsilon-optimal control, and BP neural network is used to implement ADP algorithm. At last, an example is given to demonstrate the effectiveness of the proposed algorithm.

关键词： adaptive dynamic programming state delay epsilon-optimal control finite time nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Output Feedback NN Control Using Deterministic learning Technique

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第3期25卷 635-641页

作者： Xu, Bin Yang, Chenguang Shi, Zhongke Northwestern Polytech Univ Sch Automat Xian 710072 Peoples R China Univ Plymouth Sch Comp & Math Plymouth PL4 8AA Devon England Beijing Inst Technol Sch Automat Beijing 100086 Peoples R China

In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic NN control architecture includes two NNs, whereas the critic NN is used to approximate the strategic utility function, and the action NN is employed to minimize both the strategic utility function and the tracking error. A deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit. The uniformly ultimate boundedness of closed-loop signals is shown via Lyapunov stability analysis. Simulation results are presented to demonstrate the effectiveness of the proposed control.

关键词： Approximate dynamic programming discrete-time system output feedback control pure-feedback system radial basis function neural network (RBF NN)

来源：评论

学校读者我要写书评

暂无评论

Finite-Horizon Optimal Control Design for Uncertain Linear Discrete-time Systems

Finite-Horizon Optimal Control Design for Uncertain Linear D...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Zhao, Qiming Xu, Hao Jagannathan, S. Missouri Univ S&T Dept Elect & Comp Engn Rolla MO 65409 USA

ISBN: (纸本)9781467359252

In this paper, the finite-horizon optimal adaptive control design for linear discrete-time systems with unknown system dynamics by using adaptive dynamic programming (ADP) is presented. In the presence of full state feedback, the terminal state constraint is incorporated in solving the optimal feedback control via the Bellman equation. The optimal regulation of the uncertain linear system is solved in a forward-in-time and online manner without using value and/or policy iterations. Due to the nature of finite horizon, the stability of the closed-loop system is involved but verified by using Lyapunov theory. The effectiveness of the proposed method is verified by simulation results.

关键词： Finite-horizon Optimal Control Q-learning Optimal Control adaptive Estimator Linear System

来源：评论

学校读者我要写书评

暂无评论

Impact of signal transmission delays on power system damping control using heuristic dynamic programming

Impact of signal transmission delays on power system damping...

引用

ieee symposium on Computational Intelligence Applications In Smart Grid (CIASG)

作者： Yufei Tang Xiangnan Zhong Zhen Ni Jun Yan Haibo He Department of Electrical University of Rhode Island Kingston RI USA

ISBN: (纸本)9781479945450

In this paper, the impact of signal transmission delays on static VAR compensator (SVC) based power system damping control using reinforcement learning is investigated. The SVC is used to damp low-frequency oscillation between interconnected power systems under fault conditions, where measured signals from remote areas are first collected and then transmitted to the controller as the inputs. Inevitable signal transmission delays are introduced into such design that will degrade the dynamic performance of SVC and in the worst case, cause system instability. The adopted reinforcement learning algorithm, called goal representation heuristic dynamic programming (GrHDP), is employed to design the SVC controller. Impact of signal transmission delays on the adopted controller is investigated with fully transient model based time-domain simulation in Matlab/Simulink environment. The simulation results on a four-machine two-area benchmark system with SVC demonstrate the effectiveness of the adopted algorithm on damping control and the impact of signal transmission delays.

关键词： Delays Static VAr compensators Power system stability Rotors Damping Benchmark testing Control systems

来源：评论

学校读者我要写书评

暂无评论

Finite Horizon Stochastic Optimal Control of Uncertain Linear Networked Control System

Finite Horizon Stochastic Optimal Control of Uncertain Linea...

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Xu, Hao Jagannathan, S. Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA

ISBN: (纸本)9781467359252

In this paper, finite horizon stochastic optimal control issue has been studied for linear networked control system (LNCS) in the presence of network imperfections such as network-induced delays and packet losses by using adaptive dynamic programming (ADP) approach. Due to an uncertainty in system dynamics resulting from network imperfections, the stochastic optimal control design uses a novel adaptive estimator (AE) to solve the optimal regulation of uncertain LNCS in a forward-in-time manner in contrast with backward-in-time Riccati equation-based optimal control with known system dynamics. Tuning law for unknown parameters of AE has been derived. Lyapunov theory is used to show that all the signals are uniformly ultimately bounded (UUB) with ultimate bounds being a function of initial values and final time. In addition, the estimated control input converges to optimal control input within finite horizon. Simulation results are included to show the effectiveness of the proposed scheme.

关键词： Networked Control System adaptive dynamics programming and reinforcement learning Finite horizon Stochastic Optimal Control adaptive Estimator

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for terminally constrained finite-horizon optimal control problems

Adaptive dynamic programming for terminally constrained fini...

引用

ieee Annual Conference on Decision and Control

作者： L. Andrews J. R. Klotz R. Kamalapurkar W. E. Dixon Department of Mechanical and Aerospace Engineering University of Florida Gainesville FL USA

ISBN: (纸本)9781467360890

adaptive dynamic programming is applied to control-affine nonlinear systems with uncertain drift dynamics to obtain a near-optimal solution to a finite-horizon optimal control problem with hard terminal constraints. A reinforcement learning-based actor-critic framework is used to approximately solve the Hamilton-Jacobi-Bellman equation, wherein critic and actor neural networks (NN) are used for approximate learning of the optimal value function and control policy, while enforcing the optimality condition resulting from the hard terminal constraint. Concurrent learning-based update laws relax the restrictive persistence of excitation requirement. A Lyapunov-based stability analysis guarantees uniformly ultimately bounded convergence of the enacted control policy to the optimal control policy.

关键词： control strategy dynamic programming Optimal control critic Best value Self tuning

来源：评论

学校读者我要写书评

暂无评论

Free Energy based Policy Gradients

Free Energy based Policy Gradients

引用

4th ieee International symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Theodorou, Evangelos A. Najemnik, Jiri Todorov, Emo Univ Washington Dept Comp Sci & Engn Seattle WA 98195 USA Univ Washington Dept Appl Math Seattle WA 98195 USA

ISBN: (纸本)9781467359252

Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time for free energy-like cost functions. The derivation is based on successive application of Girsanov's theorem and the use of the Radon Nikodym derivative as formulated for Markov diffusion processes. The resulting policy gradient is reward weighted. The use of Radon Nikodym extends analysis and results to more general models of stochasticity in which jump diffusions processes are considered. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements.

关键词： Free energy learning (artificial intelligence) Diffusion processes Policies Stochasticity Plethora radon derivation Reward learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：