检索结果-内蒙古大学图书馆

reinforcement-learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints

引用

ieee TRANSACTIONS ON CYBERNETICS 2015年第7期45卷 1372-1385页

作者： Liu, Derong Yang, Xiong Wang, Ding Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.

关键词： Approximate dynamic programming (ADP) neural networks (NNs) neuro-dynamic programming nonlinear systems optimal control reinforcement learning (RL) robust control

来源：评论

学校读者我要写书评

暂无评论

adaptive Optimal Control of Continuous-Time Linear Systems via Hybrid Iteration

Adaptive Optimal Control of Continuous-Time Linear Systems v...

引用

ieee symposium Series on Computational Intelligence (ieee SSCI)

作者： Qasem, Omar Gao, Weinan Bian, Tao Florida Inst Technol Mech & Civil Engn Melbourne FL 32901 USA WorldQuant LLC Old Greenwich CT 06870 USA

ISBN: (纸本)9781728190488

In this paper, we propose a novel dynamic programming (DP) algorithm, under the name of hybrid iteration (HI), for continuous-time linear systems. The proposed HI approach combines the advantages of two well-known DP algorithms, i.e., policy iteration (PI) and value iteration (VI). In particular, HI drops the need of an initial stabilizing control policy required in PI, and at the same time it maintains a faster convergence rate compared with VI. Based on the proposed HI algorithm, a data-driven adaptive optimal controller design is also proposed. Simulation results for randomly generated continuous-time linear systems with different system orders demonstrate that the proposed HI approach can save CPU time up to 73% and reduce the number of iterations to converge up to 98% comparing with the VI approach.

关键词： Hybrid iteration adaptive dynamic programming (ADP) adaptive optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Optimal Consensus Control Design for Multiagent Systems With Multiple Time Delay Using adaptive dynamic programming

引用

ieee TRANSACTIONS ON CYBERNETICS 2022年第12期52卷 12832-12842页

作者： Zhang, Huaguang Ren, He Mu, Yunfei Han, Ji Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Peoples R China

In this article, a novel data-based adaptive dynamic programming (ADP) method is presented to solve the optimal consensus tracking control problem for discrete-time (DT) multiagent systems (MASs) with multiple time delays. Necessary and sufficient conditions of the corresponding equivalent time-delay system are provided on the basis of the causal transformations. Benefitting from the construction of tracking error dynamics, the optimal tracking problem can be transformed into settling the Nash-equilibrium in the graphical game, which can be completed by solving the coupled Hamilton-Jacobi (HJ) equations. An error estimator is introduced to construct the tracking error of the MASs only using the input and output (I/O) data. Therefore, the designed data-based ADP algorithm can minimize the cost functions and ensure the consensus of MASs without the knowledge of system dynamics. Finally, a numerical example is given to demonstrate the effectiveness of the proposed method.

关键词： Games Delay effects Delays Consensus control Synchronization Optimal control System dynamics adaptive dynamic programming (ADP) data-based optimal control multiagent systems (MASs) reinforcement learning (RL) time delay

来源：评论

学校读者我要写书评

暂无评论

Safe reinforcement learning and adaptive Optimal Control With Applications to Obstacle Avoidance Problem

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2024年第3期21卷 4599-4612页

作者： Wang, Ke Mu, Chaoxu Ni, Zhen Liu, Derong Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Florida Atlantic Univ Dept Elect Engn & Comp Sci Boca Raton FL 33431 USA Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

This paper presents a novel composite obstacle avoidance control method to generate safe motion trajectories for autonomous systems in an adaptive manner. First, system safety is described using forward invariance, and the barrier function is encoded into the cost function such that the obstacle avoidance problem can be characterized by an infinite-horizon optimal control problem. Next, a safe reinforcement learning framework is proposed by combining model-based policy iteration and state-following-based approximation. Upon real-time data and extrapolated experience data, this learning design is implemented through the actor-critic structure, in which critic networks are tuned by gradient-descent adaption and actor networks produce adaptive control policies via gradient projection. Then, system stability and weight convergence are theoretically analyzed using Lyapunov method. Finally, the proposed learning-based controller is demonstrated on a two-dimensional single integrator system and a nonlinear unicycle kinematic system. Simulation results reveal that the system or agent can smoothly reach the target point while keeping a safe distance from each obstacle;at the same time, other three avoidance control methods are used to provide side-by-side comparisons and to verify some claimed advantages of the present method.

关键词： adaptive dynamic programming actor-critic reinforcement learning safe reinforcement learning obstacle avoidance optimal control neural networks.

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning Approach to Price Cloud Resources With Provable Convergence Guarantees

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2022年第12期33卷 7448-7460页

作者： Xie, Hong Lui, John C. S. Chongqing Univ Coll Comp Sci Chongqing 400044 Peoples R China Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China

How to generate more revenues is crucial to cloud providers. Evidences from the Amazon cloud system indicate that ``dynamic pricing'' would be more profitable than ``static pricing.'' The challenges are: How to set the price in real-time so to maximize revenues? How to estimate the price dependent demand so to optimize the pricing decision? We first design a discrete-time based dynamic pricing scheme and formulate a Markov decision process to characterize the evolving dynamics of the price-dependent demand. We formulate a revenue maximization framework to determine the optimal price and theoretically characterize the ``structure'' of the optimal revenue and optimal price. We apply the Q-learning to infer the optimal price from historical transaction data and derive sufficient conditions on the model to guarantee its convergence to the optimal price, but it converges slowly. To speed up the convergence, we incorporate the structure of the optimal revenue that we obtained earlier, leading to the VpQ-learning (Q-learning with value projection) algorithm. We derive sufficient conditions, under which the VpQ-learning algorithm converges to the optimal policy. Experiments on a real-world dataset show that the VpQ-learning algorithm outperforms a variety of baselines, i.e., improves the revenue by as high as 50% over the Q-learning, speedy Q-learning, and adaptive real-time dynamic programming (ARTDP), and by as high as 20% over the fixed pricing scheme.

关键词： Cloud computing Pricing Heuristic algorithms Convergence Mathematical model dynamic scheduling Computational modeling Cloud resources pricing reinforcement learning (RL) value projection

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Robust Regulation and Its Application to Power Systems

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2018年第7期65卷 5722-5732页

作者： Yang, Xiong He, Haibo Zhong, Xiangnan Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Univ North Texas Dept Elect Engn Denton TX 76207 USA

This paper presents a novel robust regulation method for a class of continuous-time nonlinear systems subject to unmatched perturbations. To begin with, the robust regulation problem is transformed into an optimal regulation problem by constructing a value function for the auxiliary system. Then, a simultaneous policy iteration (SPI) algorithm is developed to solve the optimal regulation problem with in the framework of adaptive dynamic programming. To implement the SPI algorithm, actor and critic networks are employed to approximate the optimal control and the optimal value function, respectively, and the Monte Carlo integration method is applied to obtain the unknown weight parameters. Finally, two examples, including a power system, are provided to demonstrate the applicability of the developed approach.

关键词： adaptive dynamic programming (ADP) neural network optimal control reinforcement learning robust regulation unmatched perturbation

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered Robust adaptive dynamic programming for Multiplayer Stackelberg-Nash Games of Uncertain Nonlinear Systems

引用

ieee TRANSACTIONS ON CYBERNETICS 2024年第1期54卷 273-286页

作者： Lin, Mingduo Zhao, Bo Liu, Derong Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Chongqing Univ Posts & Telecommun Minist Educ Key Lab Ind Internet Things & Networked Control Chongqing 400065 Peoples R China Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this article, an event-triggered robust adaptive dynamic programming (ETRADP) algorithm is developed to solve a class of multiplayer Stackelberg-Nash games (MSNGs) for uncertain nonlinear continuous-time systems. Considering the different roles of players in the MSNG, the hierarchical decision-making process is described as the designed value functions for the leader and all followers, which assist to transform the robust control problem of the uncertain nonlinear system into an optimal regulation problem of the nominal system. Then, an online policy iteration algorithm is formulated to solve the derived coupled Hamilton-Jacobi equation. Meanwhile, an event-triggered mechanism is designed to alleviate computational and communication burdens. Moreover, critic neural networks (NNs) are constructed to obtain the event-triggered approximate optimal control polices for all players, which constitute the Stackelberg-Nash equilibrium of the MSNG. By using Lyapunov's direct method, the stability of the closed-loop uncertain nonlinear system is guaranteed under the ETRADP-based control scheme in the sense of uniform ultimate boundedness. Finally, a numerical simulation is provided to demonstrate the effectiveness of the present ETRADP-based control scheme.

关键词： Games Optimal control Heuristic algorithms Uncertainty Robust control dynamic programming Process control adaptive dynamic programming event-triggered control neural networks (NNs) reinforcement learning (RL) robust control Stackelberg-Nash games

来源：评论

学校读者我要写书评

暂无评论

A Comparison of learning Speed and Ability to Cope Without Exploration between DHP and TD(0)

A Comparison of Learning Speed and Ability to Cope Without E...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Fairbank, Michael Alonso, Eduardo City Univ London Dept Comp Sch Informat London EC1V 0HB England

ISBN: (纸本)9781467314909

This paper demonstrates the principal motivations for Dual Heuristic dynamic programming (DHP) learning methods for use in adaptive dynamic programming and reinforcement learning, in continuous state spaces: that of automatic local exploration, improved learning speed and the ability to work without stochastic exploration in deterministic environments. In a simple experiment, the learning speed of DHP is shown to be around 1700 times faster than TD(0). DHP solves the problem without any exploration, whereas TD(0) cannot solve it without explicit exploration. DHP requires knowledge of, and differentiability of, the environment's model functions. This paper aims to illustrate the advantages of DHP when these two requirements are satisfied.

关键词： Dual Heuristic dynamic programming DHP adaptive dynamic programming reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Towards Enabling Deep learning Techniques for adaptive dynamic programming

Towards Enabling Deep Learning Techniques for Adaptive Dynam...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Ni, Zhen Malla, Naresh Zhong, Xiangnan South Dakota State Univ Elect Engn & Comp Sci Dept Brookings SD 57007 USA Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

ISBN: (纸本)9781509061822

Human-level control through deep learning and deep reinforcement learning have revealed the unique and powerful potentials through a very complex Go game. The AlphaGo, developed by Google DeepMind, has beat the top Go game player early this year. The scientific and technological advancement behind the success of AlphaGo attracted researchers from multiple areas, including machine learning, artificial intelligence, computational intelligence and so on. adaptive dynamic programming (ADP) methods have the similar fundamental principle with reinforcement learning, and show strong performance for continuous time and continuous state systems. Deep learning techniques are also possible to be integrated for ADP designs. In this paper, we discuss the key techniques and components in deep reinforcement learning and then present the successful applications for computer games and maze navigation. Future opportunities for deep learning enabled ADP will be discussed at the end.

关键词： Deep learning deep reinforcement learning (DRL) adaptive dynamic programming (ADP) experience replay computational intelligence Markov decision process

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2007年第2期37卷 425-436页

作者： He, Pingan Jagannathan, S. Univ Missouri Dept Elect & Comp Engn Rolla MO 65409 USA

A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing certain quadratic performance indexes. Using the Lyapunov approach and with novel weight updates, the uniformly ultimate boundedness of the closed-loop tracking error and weight estimates is shown in the presence of NN approximation errors and bounded unknown disturbances. The proposed NN controller works in the presence of multiple nonlinearities, unlike other schemes that normally approximate one nonlinearity. Moreover, the adaptive critic NN controller does not require an explicit offline training phase, and the NN weights can be initialized at zero or random. Simulation results justify the theoretical analysis.

关键词： approximate dynamic programming neural network control optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：