检索结果-内蒙古大学图书馆

adaptive Critic Control With Knowledge Transfer for Uncertain Nonlinear dynamical Systems: A reinforcement learning Approach

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 6752-6761页

作者： Zhang, Liangju Zhang, Kun Xie, Xiang Peng Chadli, Mohammed Nanjing Univ Posts & Telecommun Coll Automat Nanjing 210023 Jiangsu Peoples R China Nanjing Univ Posts & Telecommun Coll ArtificialIntelligence Nanjing 210023 Jiangsu Peoples R China Beihang Univ Sch Astronaut Beijing Peoples R China Nanjing Univ Posts & Telecommun Sch Internet Things Nanjing Peoples R China Univ Paris Saclay IBISC Lab F-91000 Evry France

This paper presents an online transfer heuristic dynamic programming (THDP) control approach for a class of nonlinear discrete systems. The proposed approach integrates transfer learning with adaptive critic control. To design a robust optimal control strategy for the nonlinear discrete systems, we utilize sample data collected from a source task to acquire prior knowledge. This prior knowledge is subsequently used to guide the online control process of nonlinear systems of target tasks. To avoid negative transfer effects and conserve computational resources, we introduce a novel attenuation function with a truncation mechanism. Additionally, we develop a disturbance compensation control mechanism to address uncertainties. Furthermore, we demonstrate that the properties of the uncertain nonlinear systems under robust optimal control, as well as the weight error of neural networks, are ultimately uniformly bounded given certain conditions. Finally, two simulations are conducted to verify the performance of the proposed algorithm Note to Practitioners-adaptive dynamic programming (ADP) is one of the main methods to solve the Hamilton-Jacobi-Bellman (HJB) equation. However, when using neural network approximation, it often requires a long time of iteration and a large amount of computational process, wasting a lot of computational resources. For this reason, we propose an ADP control scheme with enhanced detection speed: that is, by learning a class of similar tasks to obtain prior knowledge to assist in the online control of our actual system. At the same time, this paper considers system disturbances, which means that they are more universal and robust. After simulation experiments, it has been proven that this scheme has good performance.

关键词： adaptive dynamic programming (ADP) robust optimal control transfer reinforcement learning neural networks adaptive dynamic programming (ADP) robust optimal control transfer reinforcement learning neural networks

来源：评论

学校读者我要写书评

暂无评论

Parallel Control for Nonzero-Sum Games With Completely Unknown Nonlinear dynamics via reinforcement learning

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2025年第4期55卷 2884-2896页

作者： Lu, Jingwei Wei, Qinglai Wang, Fei-Yue Tsinghua Univ Dept Ind Engn Beijing 100084 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence S Beijing 100190 Peoples R China Macau Univ Sci & Technol Inst Syst Engn Macau Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Macau Univ Sci & Technol Fac Innovat Engn Macau Peoples R China

This article utilizes parallel control to investigate the problem of continuous-time (CT) nonzero-sum games (NZSGs) for completely unknown nonlinear systems via reinforcement learning (RL), and a parallel control-based NZSG (PNZSG) method is developed without reconstructing unknown dynamics or employing off-policy integral RL (IRL). First, novel dynamic control policies (DCPs) are developed for NZSGs by introducing controls into feedback, and an augmented system with augmented performance indices is constructed to derive the DCPs. Then, we theoretically analyze the effect of the DCPs on the control stability and performance indices, and the optimality of PNZSG is proven to be equivalent to the optimality of the original NZSGs. Subsequently, an IRL technique is employed to achieve the developed PNZSG method, and we show that no prior knowledge of the dynamics of NZSGs is needed to deploy the developed PNZSG method because of the augmented system and performance indices. Finally, numerical examples, including cooperative adaptive cruise control (CACC) of a vehicular platoon, demonstrate the correctness of the developed PNZSG method. The associated code is available at: https://***/lujingweihh/adaptive-dynamic-programming-algorithms/tree/main/model_free_nonzero_sum_games.

关键词： adaptive dynamic programming cooperative adaptive cruise control (CACC) cooperative adaptive cruise control (CACC) nonzero-sum games (NZSGs) nonzero-sum games (NZSGs) parallel control parallel control reinforcement learning (RL) reinforcement learning (RL) unknown nonlinear dynamics unknown nonlinear dynamics unknown nonlinear dynamics unknown nonlinear dynamics unknown nonlinear dynamics

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Combined Longitudinal and Lateral Control for the Car Following Problem

引用

ieee TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 2025年第3期33卷 991-1005页

作者： Cui, Leilei Chakraborty, Sayan Ozbay, Kaan Jiang, Zhong-Ping MIT Cambridge MA 02139 USA NYU Tandon Sch Engn Dept Elect & Comp Engn Control & Networks Lab Brooklyn NY 11201 USA NYU C2SMARTER Ctr Tandon Sch Engn Dept Civil & Urban Engn Brooklyn NY 11201 USA NYU Dept Elect & Comp Engn Dept Civil & Urban Engn Control & Networks LabTandon Sch Engn Brooklyn NY 11201 USA

This article studies the problem of data-driven combined longitudinal and lateral control of autonomous vehicles (AVs) such that the AV can stay within a safe but minimum distance from its leading vehicle and, at the same time, in the lane. Most of the existing methods for combined longitudinal and lateral control are either model-based or developed by purely data-driven methods such as reinforcement learning. Traditional model-based control approaches are insufficient to address the adaptive optimal control design issue for AVs in dynamically changing environments and are subject to model uncertainty. Moreover, the conventional reinforcement learning approaches require a large volume of data, and cannot guarantee the stability of the vehicle. These limitations are addressed by integrating the advanced control theory with reinforcement learning techniques. To be more specific, by utilizing adaptive dynamic programming (ADP) techniques and using the motion data collected from the vehicles, a policy iteration algorithm is proposed such that the control policy is iteratively optimized in the absence of the precise knowledge of the AV's dynamical model. Furthermore, the stability of the AV is guaranteed with the control policy generated at each iteration of the algorithm. The efficiency of the proposed approach is validated by the integrated simulation of SUMO and CommonRoad.

关键词： Adaptation models Vehicle dynamics Mathematical models Transportation Roads reinforcement learning Nonlinear dynamical systems Electronic mail dynamic programming Accuracy adaptive dynamic programming (ADP) combined longitudinal and lateral control connected vehicles

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning to Stabilize Singularly Perturbed DC-Side dynamics of Grid-Connected Voltage-Source Converters in Modern AC-DC Grids Using Singular Perturbation Theory and adaptive dynamic programming

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2025年第3期72卷 2914-2926页

作者： Davari, Masoud Zhao, Jianguo Yang, Chunyu Gao, Weinan Chai, Tianyou Georgia Southern Univ Dept Elect & Comp Engn Statesboro Campus Statesboro GA 30460 USA China Univ Min & Technol Sch Informat & Control Engn Xuzhou 221116 Peoples R China Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China

The stability and performance of ac-dc systems in grid modernization heavily rely on the rectification mode of grid-connected voltage-source converters (GC-VSCs). Being considered as the heart of the system, its impact is significant. The current-controlled GC-VSC based on the cascade control using a pulsewidth modulation approach is commonly deployed in the smart grid paradigm. This article discusses how the dynamics induced by that type of GC-VSC control structure can be regarded as singularly perturbed systems in modern ac-dc grids. As a result, it proposes a novel optimal control strategy for the voltage control problem with uncertain dynamics using reinforcement learning (RL) via the adaptive (or approximate) dynamic programming method and the singular perturbation theory (SPT). First, by means of SPT, the original optimal control problem is decomposed into two optimal problems with respect to an unknown slow time-scale subsystem and a known fast time-scale subsystem. Second, for the slow subsystem with unmeasurable states, an output-feedback-based off-policy RL algorithm with a guaranteed convergence is given in order to learn the optimal controller in terms of measurement data. Third, a composite controller is established in terms of the obtained fast-slow controllers;its optimality and closed-loop stability are rigorously proved. Unlike the direct full-order design, not only does the proposed decomposition composite design framework bypass the numerical stiffness, but it also alleviates the high dimensionality in the control synthesis. Comparative experiments using testing based on power hardware-in-the-loop simulations and rapid control prototyping methodology reveal the superiority and effectiveness of the proposed method.

关键词： Power system dynamics Voltage control Optimal control reinforcement learning Heuristic algorithms dynamic programming Adaptation models adaptive (or approximate) dynamic programming (ADP) dc-voltage dynamics grid-connected voltage-source converters (GC-VSCs) modern ac-dc grids reinforcement learning (RL) singular perturbation theory (SPT) singularly perturbed dynamics

来源：评论

学校读者我要写书评

暂无评论

An Unknown Multiplayer Nonzero-Sum Game: Prescribed-Time dynamic Event-Triggered Control via adaptive dynamic programming

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 8317-8328页

作者： Zhang, Kun Zhang, Zhi-Xuan Xie, Xiang Peng Rubio, Jose de Jesus Beihang Univ Sch Astronaut Beijing 100191 Peoples R China Nanjing Univ Posts & Telecommun Coll Automat Nanjing 210023 Peoples R China Nanjing Univ Posts & Telecommun Coll Artificial Intelligence Nanjing 210023 Peoples R China Nanjing Univ Posts & Telecommun Sch Internet Things Nanjing 210023 Peoples R China Inst Politcn Nacl Seccin Estudios Posgrad Invest Esime Azcapotzalco Ciudad De Mexico 02250 Mexico

In this paper, the novel prescribed-time dynamic event-triggered control method of an unknown multiplayer nonzero-sum game (MP-NZSG) is designed by using adaptive dynamic programming (ADP). Firstly, a neural network-based identifier is constructed to estimate the unknown system dynamics. Subsequently, a novel ADP-based dynamic event-triggered control approach is advanced to ensure optimality and prescribed-time stability. A critic neural network (NN) is established for each player to approximate the Nash equilibrium solution of the dynamic event-triggered Hamilton-Jacobi-Isaacs (HJI) equation. This network employs a novel weight updating law, based on the experience replay technique, to alleviate the persistence of excitation condition. Furthermore, using the Lyapunov method, the uniform limit boundedness analysis of the neural network approximation error and multiplayer system is validated. Additionally, minimum inter-event time (MIET) is conclusively established to mitigate the notorious Zeno behaviour. Ultimately, the efficacy of the proposed method is rigorously substantiated through comprehensive simulation results. Note to Practitioners-Our research addresses the challenges of multi-component coordinated control, particularly in spacecraft attitude control. To handle these complexities, we propose an innovative adaptive dynamic event-triggered control approach. By integrating adaptive dynamic programming and neural networks, we effectively model and manage unknown system dynamics, enhancing the controller's adaptability and robustness. dynamic event-triggered policies are introduced to optimize system performance and reduce computational costs. The ADP-based prescribed time optimal control scheme prioritizes steady-state performance of nonlinear nonaffine systems, ensuring precise task completion within specified timeframes. Additionally, experience replay technology further fortifies the controller's learning and adaptability to dynamic environments.

关键词： adaptive dynamic programming (ADP) experience replay reinforcement learning (RL) prescribed-time dynamic event-triggered control (PT-DETC) adaptive dynamic programming (ADP) experience replay reinforcement learning (RL) prescribed-time dynamic event-triggered control (PT-DETC)

来源：评论

学校读者我要写书评

暂无评论

Data-Model Hybrid-Driven Safe reinforcement learning for adaptive Avoidance Control Against Unsafe Moving Zones

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2025年 PP卷 PP页

作者： Wang, Ke Mu, Chaoxu Zhang, Anguo Sun, Changyin Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Anhui Univ Sch Artificial Intelligence Hefei 230026 Peoples R China Southeast Univ Sch Automat Nanjing 210096 Peoples R China

With the gradual application of reinforcement learning (RL), safety has emerged as a paramount concern. This article presents a novel data-model hybrid-driven safe RL (SRL) scheme to address the challenge of avoidance control in the operation domain containing multiple moving unsafe zones. First, the avoidance problem is transformed into the optimal control problem of an augmented system by encoding a barrier function (BF) term into the cost function. Then, using the idea of integral RL (IRL), an adaptive learning algorithm is proposed for generating safe control policies, in which the actor-critic neural network (NN) structure is established with the aid of state-following (StaF) kernel function. The policy iteration process is executed by this structure;specifically, the critic network undergoes gradient-descent adaptation, while the actor network employs gradient projection updating. Particularly, via a state extrapolation technique, both real-time experience and simulated experience are utilized in the learning process. Next, closed-loop stability and weight convergence are theoretically substantiated. Finally, the effectiveness of the proposed scheme is demonstrated on a single integrator system, a nonlinear numerical system, and a unicycle kinematic system;besides, its advantages over the existing control methods are illustrated by comparisons.

关键词： Safety Optimal control Cost function Artificial neural networks Kernel Adaptation models reinforcement learning Mathematical models Extrapolation Data models Actor-critic neural network (NN) adaptive dynamic programming (ADP) avoidance control policy iteration safe reinforcement learning (SRL) state extrapolation

来源：评论

学校读者我要写书评

暂无评论

adaptive Robust Stochastic Configuration Networks for Near-Infrared Multivariate Analysis

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2025年 PP卷 PP页

作者： Li, Yuqiang Du, Wenli Wang, Xinjie Yang, Minglei Zhao, Yunmeng East China Univ Sci & Technol Minist Educ Key Lab Smart Mfg Energy Chem Proc Shanghai 200237 Peoples R China Qingyuan Innovat Lab Quanzhou 362801 Peoples R China Hangzhou Normal Univ Sch Informat Sci & Technol Hangzhou 311121 Peoples R China

Near-infrared (NIR) technology has gained wide acceptance in practical processes and is now the measurement of choice in many sectors. However, with increasing spectral dimensionality, it is challenging to establish a prediction model with satisfactory stability and generalization. Stochastic configuration networks (SCNs) based on supervisory learning mechanism have demonstrated significant advantages in developing nonlinear learners. However, existing incremental learning strategies make it difficult to achieve fast convergence while obtaining a suitable-scale network in high-dimensional spectra modeling. In addition, the linear or regularization weight estimation methods are vulnerable to outliers and noise in NIR analysis. To accelerate model construction and improve model performance in high-dimensional spectra analysis, the adaptive robust SCN (AR-SCN) algorithm is proposed in this work, which can perform adaptive incremental learning according to the prediction residual and robustly estimate the output weights by the global-local shrinkage strategy. Comparison results on three benchmark NIR datasets and real-world gasoline blending process verify the effectiveness of the proposed method. Compared with the state-of-the-art SCNs, the AR-SCN method can simultaneously improve the construction efficiency and robustness of SCNs.

关键词： Adaptation models Incremental learning Stochastic processes Robustness Predictive models Bayes methods Analytical models learning systems Convergence Vectors adaptive incremental learning multivariate modeling near-infrared (NIR) spectroscopy sparse Bayesian learning (SBL) stochastic configuration networks (SCNs)

来源：评论

学校读者我要写书评

暂无评论

Approximate dynamic programming for Constrained Piecewise Affine Systems With Stability and Safety Guarantees

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2025年第3期55卷 1722-1734页

作者： He, Kanghui Shi, Shengling van den Boom, Ton de Schutter, Bart Delft Univ Technol Delft Ctr Syst & Control NL-2628 CD Delft Netherlands MIT Dept Chem Engn Cambridge MA 02139 USA

Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this article, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement learning. We accommodate nonconvex union-of-polyhedra state constraints and linear input constraints into ADP by designing PWA penalty functions. PWA function approximation is used, which allows for a mixed-integer encoding to implement ADP. The main advantage of the proposed ADP method is its online computational efficiency. Particularly, we propose two control policies, which lead to solving a smaller-scale mixed-integer linear program than conventional hybrid MPC, or a single convex quadratic program, depending on whether the policy is implicitly determined online or explicitly computed offline. We characterize the stability and safety properties of the closed-loop systems, as well as the suboptimality of the proposed policies, by quantifying the approximation errors of value functions and policies. We also develop an offline mixed-integer-linear-programming-based method to certify the reliability of the proposed method. Simulation results on an inverted pendulum with elastic walls and on an adaptive cruise control problem validate the control performance in terms of constraint satisfaction and CPU time.

关键词： Safety Costs dynamic programming Control systems Asymptotic stability Systematics Stability criteria Reliability Predictive control Optimal control Approximate dynamic programming (ADP) constrained control piecewise affine (PWA) systems reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Integral reinforcement learning-Based dynamic Event-Triggered Nonzero-Sum Games of USVs

引用

ieee TRANSACTIONS ON CYBERNETICS 2025年第4期55卷 1706-1716页

作者： Xue, Shan Zhang, Weidong Luo, Biao Liu, Derong Hainan Univ Sch Informat & Commun Engn Haikou 570228 Peoples R China Shanghai Jiao Tong Univ Dept Automat Shanghai 200240 Peoples R China Cent South Univ Sch Automat Changsha 410083 Peoples R China Southern Univ Sci & Technol Sch Automat & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this article, an integral reinforcement learning (IRL) method is developed for dynamic event-triggered nonzero-sum (NZS) games to achieve the Nash equilibrium of unmanned surface vehicles (USVs) with state and input constraints. Initially, a mapping function is designed to map the state and control of the USV into a safe environment. Subsequently, IRL-based coupled Hamilton-Jacobi equations, which avoid dependence on system dynamics, are derived to solve the Nash equilibrium. To conserve computational resources and reduce network transmission burdens, a static event-triggered control is initially designed, followed by the development of a more flexible dynamic form. Finally, a critic neural network is designed for each player to approximate its value function and control policy. Rigorous proofs are provided for the uniform ultimate boundedness of the state and the weight estimation errors. The effectiveness of the present method is demonstrated through simulation experiments.

关键词： Vehicle dynamics Event detection Games Mathematical models Nash equilibrium Heuristic algorithms Neural networks Electronic mail Computational modeling reinforcement learning adaptive dynamic programming event-triggered control integral reinforcement learning (IRL) neural network unmanned surface vehicle (USV)

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-Based 3D Trajectory Tracking Control of Hypersonic Gliding Vehicles With Time-Varying Uncertainties

引用

ieee TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 8187-8199页

作者： Luo, Biao Sun, Jingyi Tang, Rui Xu, Xiaodong Cent South Univ Sch Automat Changsha 410083 Peoples R China

In this paper, a robust three-dimensional trajectory tracking control scheme based on reinforcement learning is proposed for the glide phase of a hypersonic gliding vehicle (HGV) with time-varying uncertainties. First, the non-affine nonlinear full-state kinematics and dynamics model of the HGV glide phase is constructed. Then, without linearizing the system, the desired multiplanar reference trajectories for HGVs are planned based on the pseudo-spectral theory under the input constraints, initial conditions, and terminal conditions. Subsequently, the full-state error system is generated by subtracting the reference system state from the actual state of the HGV system with time-varying uncertainty. For the full-state HGV error system with time-varying uncertainty and input constraints, we design a reinforcement learning-based optimal control scheme for its nominal system and establish the equivalence between this optimal control and the robust control of the original HGV error system. A single-evaluation network structure is used in the concrete implementation to reduce the computational cost. A rigorous theory is given to demonstrate the uniform ultimate boundedness of the closed-loop system and the weight error. Finally, we perform simulation traces for reference trajectories with different optimization performances to verify the effectiveness of the proposed method. Note to Practitioners-There are various constraints and uncertainties in the glide phase of HGVs, which is the hinge connecting the initial descent phase and the terminal management phase. How to design robust trajectory tracking controllers for the glide phase of HGVs with complex environments and large span of flight parameters is of great significance to aerial guidance practitioners. In this paper, an RL-based three-dimensional trajectory robust tracking guidance method is proposed for the HGV glide phase system, which can resist time-varying uncertainties and satisfy flight constraints. The unifo

关键词： adaptive dynamic programming hypersonic gliding vehicle tracking control standard trajectory adaptive dynamic programming hypersonic gliding vehicle tracking control standard trajectory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：