检索结果-内蒙古大学图书馆

Decentralized Optimal Neurocontroller Design for Mismatched Interconnected Systems via Integral Policy Iteration

ieee TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS 2024年第2期71卷 687-691页

作者： Wang, Ding Fan, Wenqian Liu, Ao Qiao, Junfei Beijing Univ Technol Fac Informat Technol Beijing 100124 Peoples R China Beijing Key Lab Computat Intelligence & Intelligen Beijing Univ Technol Beijing 100124 Peoples R China Beijing Univ Technol Beijing Lab Smart Environm Protect Beijing 100124 Peoples R China Beijing Univ Technol Beijing Inst Artificial Intelligence Beijing 100124 Peoples R China

In this brief, the decentralized optimal control problem of continuous-time input-affine nonlinear systems with mismatched interconnections is investigated by utilizing data-based integral policy iteration. Initially, the decentralized mismatched subsystems are converted into the nominal auxiliary subsystems. Then, we derive the optimal controllers of the nominal auxiliary subsystems with a well-defined discounted cost function under the framework of adaptive dynamic programming. In the implementation process, the integral reinforcement learning algorithm is employed to explore the partially or completely unknown system dynamics. It is worth mentioning that the actor-critic structure is adopted based on neural networks, in order to evaluate the control policy and the performance of the control system. Besides, the least squares method is also involved in this online learning process. Finally, a simulation example is provided to illustrate the validity of the developed algorithm.

关键词： Optimal control Integrated circuit interconnections Interconnected systems Cost function reinforcement learning Heuristic algorithms dynamic programming adaptive dynamic programming data-based online control decentralized control integral policy iteration mismatched interconnections neural networks optimal control

来源：评论

学校读者我要写书评

暂无评论

Convergence of Value Iterations for Total-Cost MDPs and POMDPs with General State and Action Sets

Convergence of Value Iterations for Total-Cost MDPs and POMD...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Feinberg, Eugene A. Kasyanov, Pavlo O. Zgurovsky, Michael Z. SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY 11794 USA Natl Tech Univ Ukraine Kyiv Polytech Inst Inst Appl Syst Anal UA-03056 Kiev Ukraine Natl Tech Univ Ukraine Kyiv Polytech Inst UA-03056 Kiev Ukraine

ISBN: (纸本)9781479945528

This paper describes conditions for convergence to optimal values of the dynamic programming algorithm applied to total-cost Markov Decision Processes (MDPSs) with Borel state and action sets and with possibly unbounded one-step cost functions. It also studies applications of these results to Partially Observable MDPs (POMDPs). It is well-known that POMDPs can be reduced to special MDPs, called Completely Observable MDPs (COMDPs), whose state spaces are sets of probabilities of the original states. This paper describes conditions on POMDPs under which optimal policies for COMDPs can be found by value iteration. In other words, this paper provides sufficient conditions for solving total-costs POMDPs with infinite state, observation and action sets by dynamic programming. Examples of applications to filtration, identification, and inventory control are provided.

关键词： Markov processes convergence of numerical methods decision making dynamic programming iterative methods Borel state COMDPs Markov decision processes POMDPs action sets completely observable MDPs dynamic programming algorithm general state infinite state partially observable MDPs sufficient condition total-cost MDPs unbounded one-step cost functions value iterations convergence Convergence Cost function Equations Extraterrestrial measurements Kernel Markov chain dynamic programming algorithm convergence of numerical methods Extraterrestrial measurements iterative methods Converge Cost functions dynamic programming SETTING Sufficient conditions Kernel

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Self-learning Parallel Control

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2022年第1期52卷 192-204页

作者： Wei, Qinglai Wang, Lingxiao Lu, Jingwei Wang, Fei-Yue Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing 100049 Peoples R China Qingdao Acad Intelligent Ind Parallel Intelligence Innovat Ctr Qingdao 266109 Peoples R China

In this article, a new self-learning parallel control method, which is based on adaptive dynamic programming (ADP) technique, is developed for solving the optimal control problem of discrete- time time-varying nonlinear systems. It aims to obtain an approximate optimal control law sequence and simultaneously guarantees the convergence of the value function. Establishing the time-varying artificial system by neural networks in a certain time-horizon, a control-sequence-improvement ADP algorithm is developed to obtain the control law sequence. For the first time, the criteria of the parallel execution are presented, such that the value function is proven to converge to a finite neighborhood of the optimal performance index function. Finally, numerical results and analysis are presented to demonstrate the effectiveness of the parallel control method.

关键词： Optimal control Nonlinear systems Time-varying systems Performance analysis Complex systems Biological neural networks ACP adaptive dynamic programming (ADP) approximate dynamic programming nonlinear systems optimal control parallel control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning-based Optimal Control Considering L Computation Time Delay of Linear Discrete-time Systems

Reinforcement Learning-based Optimal Control Considering <i>...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (ADPRL)

作者： Fujita, Taishi Ushio, Toshimitsu

ISBN: (纸本)9781479945528

In embedded control systems, the control input is computed based on sensing data of a plant in a processor and there is a delay, called the computation time delay, due to the computation and the data transmission. When we design an optimal controller, we need to take the delay into account to achieve its optimality. Moreover, in the case where it is difficult to identify a mathematical model of the plant, a model free approach is useful. Especially, the reinforcement learning-based approach has been much attention to in the design of an adaptive optimal controller. In this paper, we assume that the plant is a linear system but the parameters of the plant are unknown. Then, we apply the reinforcement learning to the design of an adaptive optimal digital controller with taking the computation time delay into consideration. First, we consider the case where all states of the plant are observed, and it takes L times to update the control input. An optimal feedback gain is learned from sequences of a pair of the state and the control input. Next, we consider the case where the control input is determined from outputs of the plant. We cannot use an observer to estimate the state of the plant since the parameters of the plant are unknown. So, we use a data-based control approach for the estimation. Finally, we apply the proposed adaptive optimal controller to attitude control of a quadrotor at the hovering state and show its efficiency by simulation.

关键词： adaptive control control engineering computing control system synthesis data communication delays discrete time systems embedded systems feedback learning (artificial intelligence) linear systems optimal control parameter estimation state estimation L-computation time delay adaptive optimal digital controller attitude control data transmission data-based control approach embedded control systems linear discrete-time systems linear system mathematical model model free approach optimal feedback gain reinforcement learning Adaptation models Delay effects Optimal control Output feedback Propellers State feedback discrete time systems Linear system Optimal control Parameter estimation learning (artificial intelligence) attitude control data transmission control engineering computing PROPELLER Delay effects control input control system synthesis data communication State feedback plants

来源：评论

学校读者我要写书评

暂无评论

Off-Policy Model-Free learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs

引用

ieee TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2023年第2期70卷 910-920页

作者： Huo, Yu Wang, Ding Qiao, Junfei Li, Menghua Beijing Univ Technol Beijing Inst Artificial Intelligence Fac Informat Technol Beijing Key Lab Computat Intelligence & Intelligen Beijing 100124 Peoples R China Beijing Univ Technol Beijing Inst Artificial Intelligence Fac Informat Technol Beijing Lab Smart Environm Protect Beijing 100124 Peoples R China

In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is demonstrated. Then, aiming to eliminate the need for the system dynamics, a model-free iterative method is obtained by using the off-policy integral reinforcement learning (IRL) scheme based on the PI approach. Moreover, the system data is collected in order to construct the model-free approach. Besides, we analyze the convergence of the off-policy IRL approach by proving the equivalence between the model-free iterative approach and the model-based iterative approach. Remarkably, in the implementation of the scheme, the control policy and cost function are approximated by utilizing the actor-critic networks. The least square algorithm is utilized to learn the actor-critic networks weights depended on the collected data sets. Finally, two cases are provided to demonstrate the effectiveness of the established framework.

关键词： adaptive dynamic programming approximate dynamic programming continuous-time nonlinear systems input constraints integral reinforcement learning non-zero-sum games off-policy

来源：评论

学校读者我要写书评

暂无评论

learning Without External Reward

引用

ieee COMPUTATIONAL INTELLIGENCE MAGAZINE 2018年第3期13卷 48-54页

作者： He, Haibo Zhong, Xiangnan Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Univ North Texas Dept Elect Engn Denton TX 76203 USA

In the traditional reinforcement learning paradigm, a reward signal is applied to define the goal of the task. Usually, the reward signal is a "hand-crafted" numerical value or a pre-defined function: it tells the agent how good or bad a specific action is. However, we believe there exist situations in which the environment cannot directly provide such a reward signal to the agent. Therefore, the question is whether an agent can still learn without the external reward signal or not. To this end, this article develops a self-learning ap-proach which enables the agent to adaptively develop an internal reward signal based on a given ultimate goal, without requiring an explicit external reward signal from the environment. In this article, we aim to convey the self-learning idea in a broad sense, which could be used in a wide range of existing reinforcement learning and adaptive dynamic programming algorithms and architectures. We describe the idealized forms of this method mathematically, and also demonstrate its effectiveness through a triple-link inverted pendulum case study.

关键词： Neural networks Robot learning learning (artificial intelligence) Task analysis dynamic programming Machine learning

来源：评论

学校读者我要写书评

暂无评论

adaptive Critic learning and Experience Replay for Decentralized Event-Triggered Control of Nonlinear Interconnected Systems

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年第11期50卷 4043-4055页

作者： Yang, Xiong He, Haibo Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

In this paper, we develop a decentralized event-triggered control (ETC) strategy for a class of nonlinear systems with uncertain interconnections. To begin with, we show that the decentralized ETC policy for the whole system can be represented by a group of optimal ETC laws of auxiliary subsystems. Then, under the framework of adaptive critic learning, we construct the critic networks to solve the event-triggered Hamilton-Jacobi-Bellman equations related to these optimal ETC laws. The weight vectors used in the critic networks are updated by using the gradient descent approach and the experience replay (ER) technique together. With the aid of the ER technique, we can conquer the difficulty arising in the persistence of excitation condition. Meanwhile, by using classic Lyapunov approaches, we prove that the estimated weight vectors used in the critic networks are uniformly ultimately bounded. Moreover, we demonstrate that the obtained decentralized ETC can force the overall system to be asymptotically stable. Finally, we present an interconnected nonlinear plant to validate the proposed decentralized ETC scheme.

关键词： Erbium Interconnected systems Optimal control Artificial neural networks adaptive systems Nonlinear systems dynamic programming adaptive critic learning (ACL) adaptive dynamic programming (ADP) event-triggered control (ETC) experience replay (ER) interconnected systems reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Policy Optimization adaptive dynamic programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2023年第7期53卷 4339-4350页

作者： Lin, Mingduo Zhao, Bo Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Chongqing Univ Posts & Telecommun Key Lab Ind Internet Things & Networked Control Minist Educ Chongqing 400065 Peoples R China

In this article, a policy optimization adaptive dynamic programming (POADP) method is developed for optimal control of discrete-time unknown nonlinear systems, where the iterative control policy is parameterized to optimize the iterative Q-function directly. The relaxed condition for the learning rate is given to guarantee the convergence of the present algorithm. Furthermore, the Polyak-Lojasiewicz inequality is introduced to analyze the optimality, i.e., the iterative Q-function converges to the optimum within a given computational threshold under a finite iteration, and the rate of convergence (i.e., the required minimum number of iterations) for the developed POADP method is also illustrated. To ease real implementations, the iterative Q-function and the iterative control policy are approximated by employing an actor-critic structure. Then, an experiment-based method is developed to obtain the initial weights of actor-critic structure. Finally, numerical simulation results of two examples are provided to validate the effectiveness of the POADP algorithm.

关键词： Optimal control Convergence Optimization dynamic programming Nonlinear systems Performance analysis Iterative algorithms adaptive dynamic programming data based optimal control policy optimization (PO) reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

learning continuous-action control policies

Learning continuous-action control policies

引用

2009 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2009

作者： Pazis, Jason G. Lagoudakis, Michail Department of Electronic and Computer Engineering Technical University of Crete Chania Crete Greece

ISBN: (纸本)9781424427611

reinforcement learning for control in stochastic processes has received significant attention in the last few years. Several data-efficient methods, even for continuous state spaces, have been proposed, however most of them assume a small and discrete action space. While continuous action spaces are quite common in real-world problems, the most common approach still employed in practice is coarse discretization of the action space. This paper presents a novel, computationally-efficient method, called adaptive Action Modification, for realizing continuous-action policies, using binary decisions corresponding to adaptive increment or decrement changes in the values of the continuous action variables. The proposed approach essentially approximates any continuous action space to arbitrary resolution and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies. Our approach is coupled with three well-known reinforcement learning algorithms (Q-learning, Fitted Q-Iteration, and Least-Squares Policy Iteration) and its use and properties are thoroughly investigated and demonstrated on the continuous state-action Inverted Pendulum and Bicycle Balancing and Riding domains. © 2009 ieee.

关键词： Inverted pendulum

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered Decentralized Tracking Control of Modular Reconfigurable Robots Through adaptive dynamic programming

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2020年第4期67卷 3054-3064页

作者： Zhao, Bo Liu, Derong Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Guangdong Univ Technol Sch Automat Guangzhou 510006 Peoples R China

This paper develops an event-triggered decentralized tracking control (DTC) approach for modular reconfigurable robots (MRRs) by using adaptive dynamic programming. By establishing a decentralized neural network (NN) observer, which uses local input-output data and desired states of coupling subsystems, the local dynamics of MRR subsystem can be obtained. In order to obtain the DTC, the tracking error subsystem is augmented by the exosystem with the desired trajectory. Based on the event-triggered mechanism and a modified local cost function, the DTC is derived by solving the local Hamilton-Jacobi-Bellman equation via a local critic NN with asymptotically stable structure. The stability of the entire closed-loop MRR system is analyzed by Lyapunovs direct method. The simulation of a two-degree of freedom MRR system ensures that the developed event-triggered DTC scheme is effective.

关键词： Decentralized control Couplings Optimal control Robots dynamic programming Artificial neural networks Trajectory adaptive dynamic programming decentralized tracking control event-triggered mechanism modular reconfigurable robots optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：