检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Heydari, Ali South Dakota Sch Mines & Technol Dept Mech Engn Rapid City SD 57701 USA

ISBN: (纸本)9781479945528

A reinforcement learning based scheme for optimal switching with an infinite-horizon cost function is briefly proposed in this paper. Several theoretical questions are shown to arise regarding its convergence, optimality of the result, and continuity of the limit function, to be uniformly approximated using parametric function approximators. The main contribution of the paper is providing rigorous answers for the questions, where, sufficient conditions for convergence, optimality, and continuity are provided.

关键词： function approximation learning (artificial intelligence) infinite-horizon cost function optimal switching parametric function approximators reinforcement learning based switching scheme Approximation methods Artificial neural networks Convergence Cost function Optimal control Schedules Switches learning (artificial intelligence) Cost functions Approximation method function approximation Switches Converge Artificial neural networks Optimal control Theoretical analysis

来源：评论

学校读者我要写书评

暂无评论

Neural-Network-Based adaptive dynamic Surface Control for MIMO Systems with Unknown Hysteresis

Neural-Network-Based Adaptive Dynamic Surface Control for MI...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Liu, Lei Wang, Zhanshan Shen, Zhengwei Northeastern Univ Coll Informat Sci & Engn Shenyang Liaoning Peoples R China

ISBN: (纸本)9781479945528

This paper focuses on the composite adaptive tracking control for a class of nonlinear multiple-input-multiple-output (MIMO) systems with unknown backlash-like hysteresis nonlinearities. A dynamic surface control method is incorporated into the proposed control strategy to eliminate the problem of explosion of complexity. Compared with some existing methods, the prediction error between system state and serial-parallel estimation model is combined with compensated tracking error to construct the adaptive laws for neural network (NN) weights. It is shown that the proposed control approach can guarantee that all the signals of the resulting closed-loop systems are semi-globally uniformly ultimately bounded and the tracking error converges to a small neighborhood. Finally, simulation results are provided to confirm the effectiveness of the proposed approaches.

关键词： dynamic surface control prediction error backlash-like hysteresis adaptive neural network control

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Partially Observable dynamic Processes Using adaptive dynamic programming

Data-Driven Partially Observable Dynamic Processes Using Ada...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Zhong, Xiangnan Ni, Zhen Tang, Yufei He, Haibo Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

ISBN: (纸本)9781479945528

adaptive dynamic programming (ADP) has been widely recognized as one of the "core methodologies" to achieve optimal control for intelligent systems in Markov decision process (MDP). Generally, ADP control design requires all the information of the system dynamics. However, in many practical situations, the measured input and output data can only represent part of the system states. This means the complete information of the system cannot be available in many real-world cases, which narrows the range of application of the ADP design. In this paper, we propose a data-driven ADP method to stabilize the system with partially observable dynamics based on neural network techniques. A state network is integrated into the typical actor-critic architecture to provide an estimated state from the measured input/output sequences. The theoretical analysis and the stability discussion of this data-driven ADP method are also provided. Two examples are studied to verify our proposed method.

关键词： Markov processes

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming boundary control of uncertain coupled semi-linear parabolic PDE

Adaptive dynamic programming boundary control of uncertain c...

引用

ieee International symposium on Intelligent Control (ISIC)

作者： B. Talaei S. Jagannathan J. Singler Bohai University JinzhouCN Western Sydney University NSW

This paper develops an adaptive dynamic programming (ADP) based near optimal boundary control of distributed parameter systems (DPS) governed by uncertain coupled semi-linear parabolic partial differential equations (PDE) under Neumann boundary control condition. First, Hamilton-Jacobi-Bellman (HJB) equation is formulated without any model reduction and the optimal control policy is derived. Subsequently, a novel identifier is developed to estimate the unknown nonlinearity in PDE dynamics. Accordingly, the sub-optimal control policy is obtained by forward-in-time estimation of the value functional using a neural network (NN) online approximator and the identifier. adaptive tuning laws are proposed for learning the value functional online. Local ultimate boundedness (UB) of the closed-loop system is verified by using Lyapunov theory. The performance of proposed controller is verified via simulation on an unstable coupled diffusion reaction process.

关键词： Optimal control Mathematical model Approximation methods Tuning Aerospace electronics Estimation error State estimation

来源：评论

学校读者我要写书评

暂无评论

Model-free Q-learning over Finite Horizon for Uncertain Linear Continuous-time Systems

Model-free <i>Q</i>-learning over Finite Horizon for Uncerta...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Xu, Hao Jagannathan, S. Texas A&M Univ Coll Sci & Engn Corpus Christi TX 78412 USA Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO USA

ISBN: (纸本)9781479945528

In this paper, a novel optimal control over finite horizon has been introduced for linear continuous-time systems by using adaptive dynamic programming (ADP). First, a new time-varying Q-function parameterization and its estimator are introduced. Subsequently, Q-function estimator is tuned online by using both Bellman equation in integral form and terminal cost. Eventually, near optimal control gain is obtained by using the Q-function estimator. All the closed-loop signals are shown to be bounded by using Lyapunov stability analysis where bounds are functions of initial conditions and final time while the estimated control signal converges close to the optimal value. The simulation results illustrate the effectiveness of the proposed scheme.

关键词： adaptive dynamics programming (ADP) Q-learning Optimal Control Riccati Equation Forward-in-time

来源：评论

学校读者我要写书评

暂无评论

Continuous-Time Differential dynamic programming with Terminal Constraints

Continuous-Time Differential Dynamic Programming with Termin...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Sun, Wei Theodorou, Evangelos A. Tsiotras, Panagiotis

ISBN: (纸本)9781479945528

In this work, we revisit the continuous-time Differential dynamic programming (DDP) approach for solving optimal control problems with terminal state constraints. We derive two algorithms, each for different order of expansion of the system dynamics and we investigate their performance in terms of their convergence speed. Compared to previous work, we provide a set of backward differential equations for the value function expansion by relaxing the assumption that the initial nominal control must be very close to the optimal control solution. We apply the derived algorithms to two classical optimal control problems, namely, the inverted pendulum and the Dreyfus rocket problem and show the benefit of second order expansion.

关键词： Inverted pendulum

来源：评论

学校读者我要写书评

暂无评论

Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

Nonparametric Infinite Horizon Kullback-Leibler Stochastic C...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Pan, Yunpeng Theodorou, Evangelos A. Georgia Inst Technol Daniel Guggenheim Sch Aerosp Engn Atlanta GA 30332 USA

ISBN: (纸本)9781479945528

We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystrom approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Policy Iteration adaptive dynamic programming Algorithm for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第3期25卷 621-634页

作者： Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming discrete-time policy iteration neural networks neurodynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Novel Iterative θ-adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第4期11卷 1176-1190页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new iterative theta-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming neural networks neuro-dynamic programming nonlinear systems optimal control policy iteration reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Optimal Self-learning Battery Control in Smart Residential Grids by Iterative Q-learning Algorithm

Optimal Self-Learning Battery Control in Smart Residential G...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Wei, Qinglai Liu, Derong Shi, Guang Liu, Yu Guan, Qiang Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100864 Peoples R China Chinese Acad Sci Inst Automat Beijing 100864 Peoples R China

ISBN: (纸本)9781479945528

In this paper, a novel dual iterative Q-learning algorithm is developed to solve the optimal battery management and control problems in smart residential environments. The main idea is to use adaptive dynamic programming (ADP) technique to obtain the optimal battery management and control scheme iteratively for residential energy systems. In the developed dual iterative Q-learning algorithm, two iterations, including external and internal iterations, are introduced, where internal iteration minimizes the total cost of power loads in each period and the external iteration makes the iterative Q function converge to the optimum. For the first time, the convergence property of iterative Q-learning method is proven to guarantee the convergence property of the iterative Q function. Finally, numerical results are given to illustrate the performance of the developed algorithm.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：