检索结果-内蒙古大学图书馆

Event-Triggered ADP for Tracking Control of Partially Unknown Constrained Uncertain Systems

ieee TRANSACTIONS ON CYBERNETICS 2022年第9期52卷 9001-9012页

作者： Xue, Shan Luo, Biao Liu, Derong Gao, Ying South China Univ Technol Sch Comp Sci & Engn Guangzhou 510006 Peoples R China Cent South Univ Sch Automat Changsha 410083 Peoples R China Peng Cheng Lab Shenzhen 518000 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

An event-triggered adaptive dynamic programming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is constructed, and the solution of the optimal tracking control problem of the uncertain system is transformed into an optimal regulation of the nominal augmented system with a discounted value function. The integral reinforcement learning is employed to avoid the requirement of augmented drift dynamics. Second, the event-triggered ADP is adopted for its implementation, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated. Third, the tracking error and the weight estimation error prove to be uniformly ultimately bounded, and the existence of a lower bound for the interexecution times is analyzed. Finally, simulation results demonstrate the effectiveness of the present event-triggered ADP method.

关键词： Uncertainty Trajectory Regulation dynamic programming reinforcement learning programming Integral equations adaptive dynamic programming (ADP) event-triggering mechanism integral reinforcement learning (IRL) tracking control

来源：评论

学校读者我要写书评

暂无评论

On-policy Q-learning for adaptive Optimal Control

On-policy Q-learning for Adaptive Optimal Control

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Jha, Sumit Kumar Bhasin, Shubhendu Indian Inst Technol Dept Elect Engn New Delhi 110016 India

ISBN: (纸本)9781479945528

This paper presents a novel on-policy Q-learning approach for finding the optimal control policy online for continuous-time linear time invariant (LTI) systems with completely unknown dynamics. The proposed result estimates the unknown parameters of the optimal control policy based on the fixed point equation involving the Q-function. The gradient-based update laws, based on the minimization of the Bellman's error, are used to achieve online adaptation of parameters with the use of persistence of excitation condition. A novel asymptotically convergent state derivative estimator is presented to ensure that the proposed result is independent of knowledge of system dynamics. Simulation results are presented to validate the theoretical development.

关键词： Q-learning adaptive optimal control on-policy method

来源：评论

学校读者我要写书评

暂无评论

Using ADP to understand and replicate brain intelligence: the next level design

Using ADP to understand and replicate brain intelligence: th...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Werbos, Paul J. Natl Sci Fdn Arlington VA 22203 USA

ISBN: (纸本)9781424407064

Since the 1960's I proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptive dynamic programming (ADP) - like "reinforcement learning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Model-free Q-learning over Finite Horizon for Uncertain Linear Continuous-time Systems

Model-free <i>Q</i>-learning over Finite Horizon for Uncerta...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Xu, Hao Jagannathan, S. Texas A&M Univ Coll Sci & Engn Corpus Christi TX 78412 USA Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO USA

ISBN: (纸本)9781479945528

In this paper, a novel optimal control over finite horizon has been introduced for linear continuous-time systems by using adaptive dynamic programming (ADP). First, a new time-varying Q-function parameterization and its estimator are introduced. Subsequently, Q-function estimator is tuned online by using both Bellman equation in integral form and terminal cost. Eventually, near optimal control gain is obtained by using the Q-function estimator. All the closed-loop signals are shown to be bounded by using Lyapunov stability analysis where bounds are functions of initial conditions and final time while the estimated control signal converges close to the optimal value. The simulation results illustrate the effectiveness of the proposed scheme.

关键词： adaptive dynamics programming (ADP) Q-learning Optimal Control Riccati Equation Forward-in-time

来源：评论

学校读者我要写书评

暂无评论

Multiagent reinforcement learning in extensive form games with complete information

Multiagent reinforcement learning in extensive form games wi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Akramizadeh, Ali Menhaj, Mohammad-B. Afshar, Ahmad Polytech Univ Tehran EE Dept Ctr Computat Intelligence & Large Scale Syst Tehran Iran

ISBN: (纸本)9781424427611

Recent developments in multiagent reinforcement learning, mostly concentrate on normal form games or restrictive hierarchical form games. In this paper, we use the well known Q-learning in extensive form games which agents have a fixed priority in action selection. We also introduce a new concept called associative Q-values which not only can be used in action selection, leading to a subgame perfect equilibrium, but also can be used in update rule which is proved to be convergent. Associative Q-values are the expected utility of an agent in a game situation which is an estimate of the value of the subgame perfect equilibrium point.

关键词： Multiagent reinforcement learning extensive form game game theory backward induction subgame perfect equilibrium exploration strategies

来源：评论

学校读者我要写书评

暂无评论

Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

Nonparametric Infinite Horizon Kullback-Leibler Stochastic C...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Pan, Yunpeng Theodorou, Evangelos A. Georgia Inst Technol Daniel Guggenheim Sch Aerosp Engn Atlanta GA 30332 USA

ISBN: (纸本)9781479945528

We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystrom approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Policy Iteration adaptive dynamic programming Algorithm for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第3期25卷 621-634页

作者： Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming discrete-time policy iteration neural networks neurodynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Discrete-time LQR Optimal Tracking Control Problems with Unknown dynamics

Adaptive Dynamic Programming for Discrete-time LQR Optimal T...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Liu, Yang Luo, Yanhong Zhang, Huaguang Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China

ISBN: (纸本)9781479945528

In this paper, an optimal tracking control approach based on adaptive dynamic programming (ADP) algorithm is proposed to solve the linear quadratic regulation (LQR) problems for unknown discrete-time systems in an online fashion. First, we convert the optimal tracking problem into designing infinite-horizon optimal regulator for the tracking error dynamics based on the system transformation. Then we expand the error state equation by the history data of control and state. The iterative ADP algorithm of policy iteration (PI) and value iteration (VI) are introduced to solve the value function of the controlled system. It is shown that the proposed ADP algorithm solves the LQR without requiring any knowledge of the system dynamics. The simulation results show the convergence and effectiveness of the proposed control scheme.

关键词： Digital control systems

来源：评论

学校读者我要写书评

暂无评论

adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal learning Control

引用

ieee/CAA Journal of Automatica Sinica 2023年第9期10卷 1797-1809页

作者： Ding Wang Jiangyu Wang Mingming Zhao Peng Xin Junfei Qiao IEEE Faculty of Information Technology the Beijing Key Laboratory of Computational Intelligence and Intelligent Systemthe Beijing Laboratory of Smart Environmental Protectionand the Beijing Institute of Artificial IntelligenceBeijing University of TechnologyBeijing 100124China

This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control *** is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)***,the stability of the system is analyzed using control policies generated by ***,a general stability criterion is designed to determine the admissibility of the current control *** is,the criterion is applicable not only to traditional value iteration and policy iteration but also to ***,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency ***,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter ***,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.

关键词： adaptive critic artificial neural networks Hamilton-Jacobi-Bellman(HJB)equation multi-step heuristic dynamic programming multi-step reinforcement learning optimal control

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming and adaptive Optimal Output Regulation of Linear Systems

引用

ieee TRANSACTIONS ON AUTOMATIC CONTROL 2016年第12期61卷 4164-4169页

作者： Gao, Weinan Jiang, Zhong-Ping NYU Tandon Sch Engn Dept Elect & Comp Engn Brooklyn NY 11201 USA

This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs. reinforcement learning and adaptive dynamic programming techniques are employed to compute an approximated optimal controller using input/partial-state data despite unknown system dynamics and unmeasurable disturbance. Rigorous stability analysis shows that the proposed controller exponentially stabilizes the closed-loop system and the output of the plant asymptotically tracks the given reference signal. Simulation results on a LCL coupled inverter-based distributed generation system demonstrate the effectiveness of the proposed approach.

关键词： adaptive control approximate/adaptive dynamic programming (ADP) optimal control output regulation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：