检索结果-内蒙古大学图书馆

adaptive Optimal Control via Q-learning for Ito Fuzzy Stochastic Nonlinear Continuous-Time Systems With Stackelberg Game

引用

ieee TRANSACTIONS ON FUZZY SYSTEMS 2024年第4期32卷 2029-2038页

作者： Ming, Zhongyang Zhang, Huaguang Yan, Ying Yang, Liu Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Peoples R China

In order to solve the two-player Stackelberg game for the continuous-time nonlinear stochastic system, using the Takagi-Sugeno (T-S) fuzzy stochastic model, this paper defines the novel Q-functions and suggests an adaptive dynamic programming (ADP)-based approach that is completely model-free. First, based on the T-S fuzzy model, the overall fuzzy control policies with corresponding cost functions are designed where coupled penalty functions are considered. Subsequently, we create a novel two-level algorithm based on integral reinforcement learning and provide the proof of convergence to overcome challenge of computing the optimal cost functions analytically. On this basis, in order to achieve entirely model-free learning, which is the first attempt in solving fuzzy stochastic nonlinear continuous-time systems with the Stackelberg game problem, the innovative action-dependent Q-functions are developed. Fuzzy linearization technique and Q-learning algorithm are ingeniously combined in this article to solve their respective difficulties. In addition, the Lyapunov approach under the ADP-based control scheme ensures the stability of the closed-loop nonlinear stochastic system based on fuzzy approximation and is characterized by asymptotic stability. Finally, a numerical simulation is offered to show the efficacy of the existing ADP-based control technique.

关键词： Stochastic processes Games Q-learning Optimal control Stochastic systems Cost function Fuzzy systems adaptive dynamic programming (ADP) fuzzy stochastic nonlinear system Q-learning stackelberg game

来源：评论

学校读者我要写书评

暂无评论

adaptive Multi-Step Evaluation Design With Stability Guarantee for Discrete-Time Optimal learning Control

引用

ieee/CAA Journal of Automatica Sinica 2023年第9期10卷 1797-1809页

作者： Ding Wang Jiangyu Wang Mingming Zhao Peng Xin Junfei Qiao IEEE Faculty of Information Technology the Beijing Key Laboratory of Computational Intelligence and Intelligent Systemthe Beijing Laboratory of Smart Environmental Protectionand the Beijing Institute of Artificial IntelligenceBeijing University of TechnologyBeijing 100124China

This paper is concerned with a novel integrated multi-step heuristic dynamic programming(MsHDP)algorithm for solving optimal control *** is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)***,the stability of the system is analyzed using control policies generated by ***,a general stability criterion is designed to determine the admissibility of the current control *** is,the criterion is applicable not only to traditional value iteration and policy iteration but also to ***,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency ***,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter ***,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.

关键词： adaptive critic artificial neural networks Hamilton-Jacobi-Bellman(HJB)equation multi-step heuristic dynamic programming multi-step reinforcement learning optimal control

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-Based 3D Trajectory Tracking Control of Hypersonic Gliding Vehicles With Time-Varying Uncertainties

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 8187-8199页

作者： Luo, Biao Sun, Jingyi Tang, Rui Xu, Xiaodong Cent South Univ Sch Automat Changsha 410083 Peoples R China

In this paper, a robust three-dimensional trajectory tracking control scheme based on reinforcement learning is proposed for the glide phase of a hypersonic gliding vehicle (HGV) with time-varying uncertainties. First, the non-affine nonlinear full-state kinematics and dynamics model of the HGV glide phase is constructed. Then, without linearizing the system, the desired multiplanar reference trajectories for HGVs are planned based on the pseudo-spectral theory under the input constraints, initial conditions, and terminal conditions. Subsequently, the full-state error system is generated by subtracting the reference system state from the actual state of the HGV system with time-varying uncertainty. For the full-state HGV error system with time-varying uncertainty and input constraints, we design a reinforcement learning-based optimal control scheme for its nominal system and establish the equivalence between this optimal control and the robust control of the original HGV error system. A single-evaluation network structure is used in the concrete implementation to reduce the computational cost. A rigorous theory is given to demonstrate the uniform ultimate boundedness of the closed-loop system and the weight error. Finally, we perform simulation traces for reference trajectories with different optimization performances to verify the effectiveness of the proposed method. Note to Practitioners-There are various constraints and uncertainties in the glide phase of HGVs, which is the hinge connecting the initial descent phase and the terminal management phase. How to design robust trajectory tracking controllers for the glide phase of HGVs with complex environments and large span of flight parameters is of great significance to aerial guidance practitioners. In this paper, an RL-based three-dimensional trajectory robust tracking guidance method is proposed for the HGV glide phase system, which can resist time-varying uncertainties and satisfy flight constraints. The unifo

关键词： adaptive dynamic programming hypersonic gliding vehicle tracking control standard trajectory adaptive dynamic programming hypersonic gliding vehicle tracking control standard trajectory

来源：评论

学校读者我要写书评

暂无评论

Integral reinforcement learning-Based dynamic Event-Triggered Nonzero-Sum Games of USVs

引用

ieee TRANSACTIONS ON CYBERNETICS 2025年第4期55卷 1706-1716页

作者： Xue, Shan Zhang, Weidong Luo, Biao Liu, Derong Hainan Univ Sch Informat & Commun Engn Haikou 570228 Peoples R China Shanghai Jiao Tong Univ Dept Automat Shanghai 200240 Peoples R China Cent South Univ Sch Automat Changsha 410083 Peoples R China Southern Univ Sci & Technol Sch Automat & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this article, an integral reinforcement learning (IRL) method is developed for dynamic event-triggered nonzero-sum (NZS) games to achieve the Nash equilibrium of unmanned surface vehicles (USVs) with state and input constraints. Initially, a mapping function is designed to map the state and control of the USV into a safe environment. Subsequently, IRL-based coupled Hamilton-Jacobi equations, which avoid dependence on system dynamics, are derived to solve the Nash equilibrium. To conserve computational resources and reduce network transmission burdens, a static event-triggered control is initially designed, followed by the development of a more flexible dynamic form. Finally, a critic neural network is designed for each player to approximate its value function and control policy. Rigorous proofs are provided for the uniform ultimate boundedness of the state and the weight estimation errors. The effectiveness of the present method is demonstrated through simulation experiments.

关键词： Vehicle dynamics Event detection Games Mathematical models Nash equilibrium Heuristic algorithms Neural networks Electronic mail Computational modeling reinforcement learning adaptive dynamic programming event-triggered control integral reinforcement learning (IRL) neural network unmanned surface vehicle (USV)

来源：评论

学校读者我要写书评

暂无评论

Approximate dynamic programming for Constrained Piecewise Affine Systems With Stability and Safety Guarantees

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2025年第3期55卷 1722-1734页

作者： He, Kanghui Shi, Shengling van den Boom, Ton de Schutter, Bart Delft Univ Technol Delft Ctr Syst & Control NL-2628 CD Delft Netherlands MIT Dept Chem Engn Cambridge MA 02139 USA

Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this article, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement learning. We accommodate nonconvex union-of-polyhedra state constraints and linear input constraints into ADP by designing PWA penalty functions. PWA function approximation is used, which allows for a mixed-integer encoding to implement ADP. The main advantage of the proposed ADP method is its online computational efficiency. Particularly, we propose two control policies, which lead to solving a smaller-scale mixed-integer linear program than conventional hybrid MPC, or a single convex quadratic program, depending on whether the policy is implicitly determined online or explicitly computed offline. We characterize the stability and safety properties of the closed-loop systems, as well as the suboptimality of the proposed policies, by quantifying the approximation errors of value functions and policies. We also develop an offline mixed-integer-linear-programming-based method to certify the reliability of the proposed method. Simulation results on an inverted pendulum with elastic walls and on an adaptive cruise control problem validate the control performance in terms of constraint satisfaction and CPU time.

关键词： Safety Costs dynamic programming Control systems Asymptotic stability Systematics Stability criteria Reliability Predictive control Optimal control Approximate dynamic programming (ADP) constrained control piecewise affine (PWA) systems reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 2013 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2013 - 2013 ieee symposium Series on Computational Intelligence, SSCI 2013

Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic P...

引用

2013 4th ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2013

ISBN: (纸本)9781467359252

The proceedings contain 28 papers. The topics discussed include: local stability analysis of high-order recurrent neural networks with multi-step piecewise linear activation functions;finite-horizon optimal control design for uncertain linear discrete-time systems;adaptive optimal control for nonlinear discrete-time systems;optimal control for a class of nonlinear system with controller constraints based on finite-approximation-errors ADP algorithm;finite horizon stochastic optimal control of uncertain linear networked control system;real-time tracking on adaptive critic design with uniformly ultimately bounded condition;a novel approach for constructing basis functions in approximate dynamic programming for feedback control;and a combined hierarchical reinforcement learning based approach for multi-robot cooperative target searching in complex unknown environments.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Safe reinforcement learning and adaptive Optimal Control With Applications to Obstacle Avoidance Problem

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2024年第3期21卷 4599-4612页

作者： Wang, Ke Mu, Chaoxu Ni, Zhen Liu, Derong Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Florida Atlantic Univ Dept Elect Engn & Comp Sci Boca Raton FL 33431 USA Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

This paper presents a novel composite obstacle avoidance control method to generate safe motion trajectories for autonomous systems in an adaptive manner. First, system safety is described using forward invariance, and the barrier function is encoded into the cost function such that the obstacle avoidance problem can be characterized by an infinite-horizon optimal control problem. Next, a safe reinforcement learning framework is proposed by combining model-based policy iteration and state-following-based approximation. Upon real-time data and extrapolated experience data, this learning design is implemented through the actor-critic structure, in which critic networks are tuned by gradient-descent adaption and actor networks produce adaptive control policies via gradient projection. Then, system stability and weight convergence are theoretically analyzed using Lyapunov method. Finally, the proposed learning-based controller is demonstrated on a two-dimensional single integrator system and a nonlinear unicycle kinematic system. Simulation results reveal that the system or agent can smoothly reach the target point while keeping a safe distance from each obstacle;at the same time, other three avoidance control methods are used to provide side-by-side comparisons and to verify some claimed advantages of the present method.

关键词： adaptive dynamic programming actor-critic reinforcement learning safe reinforcement learning obstacle avoidance optimal control neural networks.

来源：评论

学校读者我要写书评

暂无评论

Self-Triggered Approximate Optimal Neuro-Control for Nonlinear Systems Through adaptive dynamic programming

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2024年第3期36卷 4713-4723页

作者： Zhao, Bo Zhang, Shunchao Liu, Derong Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Chongqing Univ Posts & Telecommun Key Lab Ind Internet Things & Networked Control Minist Educ Chongqing 400065 Peoples R China Guangdong Univ Finance Sch Internet Finance & Informat Engn Guangzhou 510521 Peoples R China Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this article, a novel self-triggered approximate optimal neuro-control scheme is presented for nonlinear systems by utilizing adaptive dynamic programming (ADP). According to the Bellman principle of optimality, the cost function of the general nonlinear system is approximated by building a critic neural network with a nested updating weight vector. Thus, the Hamilton-Jacobi-Bellman equation is solved to indirectly obtain the approximate optimal neuro-control input. In order to reduce the computation, the communication bandwidth, and the energy consumption, an appropriate self-triggering condition is designed as an alternative way to predict the updating time instants of the approximate optimal neuro-control policy. On the basis of Lyapunov's direct method, the stability of the closed-loop nonlinear system is analyzed and guaranteed to be uniformly ultimately bounded. Simulation results of two practical systems illustrate the present ADP-based self-triggered approximate optimal neuro-control scheme to be reasonable and effective.

关键词： adaptive dynamic programming (ADP) neural networks (NNs) optimal control reinforcement learning self-triggered control

来源：评论

学校读者我要写书评

暂无评论

Optimal Control of Nonlinear Systems Using Experience Inference Human-Behavior learning

引用

ieee/CAA Journal of Automatica Sinica 2023年第1期10卷 90-102页

作者： Adolfo Perrusquía Weisi Guo IEEE the School of Aerospace Transport and ManufacturingCranfield UniversityBedfordUK

Safety critical control is often trained in a simulated environment to mitigate *** migration of the biased controller requires further *** this paper,an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear *** approach is inspired in the complementary properties that exhibits the hippocampus,the neocortex,and the striatum learning systems located in the *** hippocampus defines a physics informed reference model of the realworld nonlinear system for experience inference and the neocortex is the adaptive dynamic programming(ADP)or reinforcement learning(RL)algorithm that ensures optimal performance of the reference *** optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference *** and convergence of the proposed approach is analyzed using Lyapunov stability *** studies are carried out to verify the approach.

关键词： Experience inference hippocampus learning system linear time-variant(LTV)systems neocortex/striatum learning systems nonlinear systems optimal control

来源：评论

学校读者我要写书评

暂无评论

Novel Discounted adaptive Critic Control Designs With Accelerated learning Formulation

引用

ieee TRANSACTIONS ON CYBERNETICS 2024年第5期54卷 3003-3016页

作者： Ha, Mingming Wang, Ding Liu, Derong Ant Grp MYbank Beijing 100020 Peoples R China Univ Sci & Technol Beijing Sch Automation & Elect Engn Beijing 100083 Peoples R China Beijing Univ Technol Fac Informat Technol Beijing Key Lab Computat Intelligence & Intelligen Beijing 100124 Peoples R China Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

Inspired by the successive relaxation method, a novel discounted iterative adaptive dynamic programming framework is developed, in which the iterative value function sequence possesses an adjustable convergence rate. The different convergence properties of the value function sequence and the stability of the closed-loop systems under the new discounted value iteration (VI) are investigated. Based on the properties of the given VI scheme, an accelerated learning algorithm with convergence guarantee is presented. Moreover, the implementations of the new VI scheme and its accelerated learning design are elaborated, which involve value function approximation and policy improvement. A nonlinear fourth-order ball-and-beam balancing plant is used to verify the performance of the developed approaches. Compared with the traditional VI, the present discounted iterative adaptive critic designs greatly accelerate the convergence rate of the value function and reduce the computational cost simultaneously.

关键词： Iterative methods Convergence Power system stability Optimal control Stability criteria Cost function Closed loop systems adaptive critic designs adaptive dynamic programming (ADP) discrete-time nonlinear systems fast convergence rate reinforcement learning value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：