检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Al-Talabi, Ahmad A. Schwartz, Howard M. Carleton Univ Dept Syst & Comp Engn 1125 Colonel By Dr Ottawa ON K1S 5B6 Canada Univ Baghdad Al Khwarizmi Coll Engn Mechatron Engn Dept Baghdad Iraq

ISBN: (纸本)9781479945528

This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control strategies simultaneously by interacting with each other. Each player's learning process depends on the rewards received from its environment. The learning process is implemented using a two stage learning algorithm that combines the particle swarm optimization (PSO)-based fuzzy logic control (FLC) algorithm with the Q-learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzy logic controller whereas the QFIS algorithm is used as a local optimizer. The two stage learning algorithm is compared through simulation with the default control strategy, the PSO-based FLC algorithm, and the QFIS algorithm. Simulation results show that the players are able to learn their default control strategies. Also, it shows that the two stage learning algorithm outperforms the PSO-based FLC algorithm and the QFIS algorithm with respect to the learning time.

关键词： control system analysis computing fuzzy control fuzzy reasoning game theory learning (artificial intelligence) particle swarm optimisation FLC PE PSO Q-learning fuzzy inference system algorithm QFIS default control strategies dual learning fuzzy logic controller global optimizer particle swarm optimization based fuzzy logic control algorithm pursuit-evasion differential game two stage learning technique Approximation algorithms Fuzzy logic Games Inference algorithms Sociology Statistics Tuning control system analysis computing Game theory fuzzy logic controller Inference algorithms Particle swarm optimization Fuzzy control Sociology fuzzy reasoning Approximation algorithms tuning parametric subharmonic oscillator Fuzzy logic Polyethylenes Players

来源：评论

学校读者我要写书评

暂无评论

adaptive Critic Control Design with Knowledge Transfer for Wastewater Treatment Applications

引用

ieee TRANSACTIONS ON INDUSTRIAL INFORMATICS 2024年第2期20卷 1488-1497页

作者： Wang, Ding Li, Xin Zhao, Mingming Qiao, Junfei Beijing Univ Technol Fac Informat Technol Beijing Key Lab Computat Intelligence & Intellige Beijing 100124 Peoples R China Beijing Univ Technol Beijing Inst Artificial Intelligence Beijing 100124 Peoples R China

The wastewater treatment process (WWTP) is of great significance to environmental protection. To improve the efficiency of the WWTP, it is crucial to ensure that the dissolved oxygen (DO) concentration tracks the set value efficiently. Due to the nonlinear and time-varying dynamics of the WWTP, traditional control methods cannot accurately control the DO concentration. To overcome these challenges, this paper proposes an online transferred heuristic dynamic programming (TrHDP) control design by combining transfer learning with adaptive critic design. First, we use the historical sample data to construct a mathematical model of the WWTP and learn the prior knowledge from the model. Then, the online control process of the DO concentration is guided by utilizing the prior knowledge. In order to avoid negative transfer and save computing resources, we design a novel decay function with the truncation mechanism. In addition, we prove the stability of the TrHDP control scheme by constructing a Lyapunov function. Finally, the performance of the TrHDP scheme is verified by the Benchmark Simulation Model No. 1. Compared with other methods, the TrHDP method possesses higher control accuracy for the DO concentration and overcomes the disadvantage of low learning efficiency of general online methods.

关键词： adaptive dynamic programming neural networks reinforcement learning transfer learning wastewater treatment applications

来源：评论

学校读者我要写书评

暂无评论

Event-Driven H_∞-Constrained Control Using adaptive Critic learning

引用

ieee TRANSACTIONS ON CYBERNETICS 2021年第10期51卷 4860-4872页

作者： Yang, Xiong He, Haibo Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

This article considers an event-driven H-infinity control problem of continuous-time nonlinear systems with asymmetric input constraints. Initially, the H-infinity-constrained control problem is converted into a two-person zero-sum game with the discounted nonquadratic cost function. Then, we present the event-driven Hamilton-Jacobi-Isaacs equation (HJIE) associated with the two-person zero-sum game. Meanwhile, we develop a novel event-triggering condition making Zeno behavior excluded. The present event-triggering condition differs from the existing literature in that it can make the triggering threshold non-negative without the requirement of properly selecting the prescribed level of disturbance attenuation. After that, under the framework of adaptive critic learning, we use a single critic network to solve the event-driven HJIE and tune its weight parameters by using historical and instantaneous state data simultaneously. Based on the Lyapunov approach, we demonstrate that the uniform ultimate boundedness of all the signals in the closed-loop system is guaranteed. Finally, simulations of a nonlinear plant are presented to validate the developed event-driven H-infinity control strategy.

关键词： adaptive critic learning (ACL) adaptive dynamic programming (ADP) asymmetric constraints event-driven H-infinity control reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Computing and Communication Cost-Aware Service Migration Enabled by Transfer reinforcement learning for dynamic Vehicular Edge Computing Networks

引用

ieee TRANSACTIONS ON MOBILE COMPUTING 2024年第1期23卷 257-269页

作者： Peng, Yan Tang, Xiaogang Zhou, Yiqing Li, Jintao Qi, Yanli Liu, Ling Lin, Hai Inst Comp Technol State Key Lab Processors Beijing 100190 Peoples R China Beijing Key Lab Mobile Comp & Pervas Device Beijing 100190 Peoples R China Space Engn Univ Sch Aerosp Informat Beijing 100015 Peoples R China

Due to the high mobility of vehicles, service migration is inevitable in vehicular edge computing (VEC) networks. Frequent service migrations incur prohibitive migration cost including the computing cost (e.g., increased computing delay) and communication cost (e.g., occupied backhaul bandwidth). Yet existing service migration schemes are usually designed without considering the impact of the computing cost. This paper considers the impact of computing and communication cost jointly, and proposes a computing and communication cost-aware service migration scheme for VEC networks (i.e., CA-migration). Taking the service delay as a QoS metric for VEC networks, this paper formulates a migration optimization problem aiming to maximize the services' satisfaction degree of delay (i.e., the probability that the service delay is smaller than the service delay requirement), where both the communication cost and computing cost affect the services' satisfaction degree. Since the optimization problem is a constrained non-linear integer programming problem, it is difficult to solve. Moreover, the VEC networks are highly dynamic. Thus, a fast transfer reinforcement learning (fast-TRL) method combining transfer learning and reinforcement learning is proposed to provide an adaptive service migration scheme in dynamic VEC networks. Simulation results show that compared with existing schemes, the proposed CA-migration scheme can increase the satisfaction degree by up to 30%, and needs 25% less training time to obtain the optimal service migration policy.

关键词： Vehicular edge computing service migration computing cost services' satisfaction degree fast transfer reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered ADP for Nonzero-Sum Games of Unknown Nonlinear Systems

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2022年第5期33卷 1905-1913页

作者： Zhao, Qingtao Sun, Jian Wang, Gang Chen, Jie Beijing Inst Technol Key Lab Intelligent Control & Decis Complex Syst Beijing 100081 Peoples R China Chongqing Innovat Ctr Beijing Inst Technol Chongqing 401120 Peoples R China Tongji Univ Sch Elect & Informat Engn Shanghai 200092 Peoples R China Beijing Inst Technol State Key Lab Intelligent Control & Decis Complex Beijing 100081 Peoples R China

For nonzero-sum (NZS) games of nonlinear systems, reinforcement learning (RL) or adaptive dynamic programming (ADP) has shown its capability of approximating the desired index performance and the optimal input policy iteratively. In this article, an event-triggered ADP is proposed for NZS games of continuous-time nonlinear systems with completely unknown system dynamics. To achieve the Nash equilibrium solution approximately, the critic neural networks and actor neural networks are utilized to estimate the value functions and the control policies, respectively. Compared with the traditional time-triggered mechanism, the proposed algorithm updates the neural network weights as well as the inputs of players only when a state-based event-triggered condition is violated. It is shown that the system stability and the weights' convergence are still guaranteed under mild assumptions, while occupation of communication and computation resources is considerably reduced. Meanwhile, the infamous Zeno behavior is excluded by proving the existence of a minimum inter-event time (MIET) to ensure the feasibility of the closed-loop event-triggered continuous-time system. Finally, a numerical example is simulated to illustrate the effectiveness of the proposed approach.

关键词： Games Neural networks Optimal control Nonlinear dynamical systems Approximation algorithms Nash equilibrium Heuristic algorithms adaptive dynamic programming (ADP) event-triggered nonzero-sum (NZS) games reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming based Control Scheme for Uncertain Two-Wheel Robots

Adaptive Dynamic Programming based Control Scheme for Uncert...

引用

ieee International Conference on Autonomous Robot Systems and Competitions (ICARSC)

作者： Thien Van Nguyen Hai Xuan Le Hoang Viet Tran Duc Anh Nguyen Minh Ngoc Nguyen Linh Nguyen Hanoi Univ Ind Hanoi Vietnam Hanoi Univ Sci & Technol Hanoi Vietnam Federat Univ Australia Melbourne Vic Australia

ISBN: (纸本)9781665431989

The paper addresses the problem of effectively controlling a two-wheel robot given its inherent non-linearity and parameter uncertainties. In order to deal with the unknown and uncertain dynamics of the robot, it is proposed to employ the adaptive dynamic programming, a reinforcement learning based technique, to develop an optimal control law. It is interesting that the proposed algorithm does not require kinematic parameters while finding the optimal state controller is guaranteed. Moreover, convergence of the optimal control scheme is theoretically proved. The proposed approach was implemented in a synthetic two-wheel robot where the obtained results demonstrate its effectiveness.

关键词： reinforcement learning two-wheel robot adaptive control adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Control of Robotic Knee With Human-in-the-Loop by Flexible Policy Iteration

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2022年第10期33卷 5873-5887页

作者： Gao, Xiang Si, Jennie Wen, Yue Li, Minhan Huang, He Arizona State Univ Dept Elect Comp & Energy Engn Tempe AZ 85287 USA North Carolina State Univ Dept Biomed Engn Raleigh NC 27695 USA Univ N Carolina Chapel Hill NC 27599 USA

We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees, such as stability and optimality at system level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem, and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system-level performances, including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high-dimensional control inputs.

关键词： Robots Impedance Tuning Prosthetics Knee Erbium Legged locomotion adaptive optimal control data- and time-efficient learning flexible policy iteration (FPI) human-in-the-loop reinforcement learning (RL) robotic knee

来源：评论

学校读者我要写书评

暂无评论

Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2008年第4期38卷 994-1001页

作者： Yang, Qinmin Vance, Jonathan Blake Jagannathan, S. Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA

A nonaffine discrete-time system represented by the nonlinear autoregressive moving average with eXogenous input (NARMAX) representation with unknown nonlinear system dynamics is considered. An equivalent affinelike representation in terms of the tracking error dynamics, is first obtained from the original nonaffine nonlinear discrete-time system so that reinforcement-learning-based near-optimal neural network (NN) controller can be developed. The control scheme consists of two linearly parameterized NNs. One NN is designated as the critic NN, which approximates a predefined long-term cost function, and an action NN is employed to derive a near-optimal control signal for the system to track a desired trajectory while minimizing the cost function simultaneously. The NN weights are tuned online. By using the standard Lyapunov approach, the stability of the closed-loop system is shown. The net result is a supervised actor-critic NN controller scheme which can be applied to a general nonaffine nonlinear discrete-time system without needing the affinelike representation. Simulation results demonstrate satisfactory performance of the controller.

关键词： adaptive critic adaptive dynamic programming Lyapunov stability neural network control reinforcement learning control

来源：评论

学校读者我要写书评

暂无评论

Online adaptive learning of optimal control solutions using integral reinforcement learning

Online adaptive learning of optimal control solutions using ...

引用

作者： Vamvoudakis, Kyriakos G. Vrabie, Draguna Lewis, Frank L. Automation and Robotics Research Institute University of Texas at Arlington Fort Worth TX 76118 United States

ISBN: (纸本)9781424498888

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system's drift dynamics. The adaptive algorithm is based on policy iteration, and it is implemented on an actor/critic structure. Both actor and critic neural networks are adapted simultaneously a persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. © 2011 ieee.

关键词： adaptive algorithms

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Non-Zero-Sum Games With Completely Unknown dynamics

引用

ieee TRANSACTIONS ON CYBERNETICS 2021年第6期51卷 2929-2943页

作者： Song, Ruizhuo Wei, Qinglai Zhang, Huaguang Lewis, Frank L. Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Peoples R China Univ Texas Arlington UTA Res Inst Ft Worth TX 76118 USA

In this article, off-policy reinforcement learning (RL) algorithm is established to solve the discrete-time N-player nonzero-sum (NZS) games with completely unknown dynamics. The N-coupled generalized algebraic Riccati equations (GARE) are derived, and then policy iteration (PI) algorithm is used to obtain the N-tuple of iterative control and iterative value function. As the system dynamics is necessary in PI algorithm, off-policy RL method is developed for discrete-time N-player NZS games. The off-policy N-coupled Hamilton-Jacobi (HJ) equation is derived based on quadratic value functions. According to the Kronecker product, the N-coupled HJ equation is decomposed into unknown parameter part and the system operation data part, which makes the N-coupled HJ equation solved independent of system dynamics. The least square is used to calculate the iterative value function and N-tuple of iterative control. The existence of Nash equilibrium is proved. The result of the proposed method for discrete-time unknown dynamics NZS games is indicated by the simulation examples.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming discrete-time nonzero-sum (NZS) off-policy reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：