检索结果-内蒙古大学图书馆

Model-Free Value Iteration Solution for dynamic Graphical Games 23

Model-Free Value Iteration Solution for Dynamic Graphical Ga...

ieee International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)

作者： Abouheaf, Mohammed Gueaieb, Wail Univ Ottawa Sch Elect Engn & Comp Sci Ottawa ON Canada

ISBN: (纸本)9781538646182

The dynamic graphical game is a special class of games where agents interact within a communication graph. This paper introduces an online model-free adaptive learning solution for dynamic graphical games. A reinforcement learning is applied in the form solutions to a set of modified coupled Bellman equations. The technique is implemented in a distributed fashion using the local neighborhood information without having a priori knowledge about the agents' dynamics. This is accomplished by means of adaptive critics, where a multi-layer perceptron neural network is applied to approximate the online solution. To this end, a novel coupled Riccati equation is developed for the graphical game. The validity of the proposed online adaptive learning solution is tested using a graphical example, where follower agents learn to synchronize their behavior to follow a leader.

关键词： Mathematical model Games dynamic programming Vehicle dynamics Synchronization Heuristic algorithms learning (artificial intelligence)"

来源：评论

学校读者我要写书评

暂无评论

H_∞ Control of Constrained-Input Nonlinear Systems with Unknown Model Based on adaptive dynamic programming 30

H<sub>∞</sub> Control of Constrained-Input Nonlinear System...

引用

30th Chinese Control and Decision Conference (CCDC)

作者： Pu, Jun Ma, Qingliang Gu, Fan Yu, Zexiang Xian Res Inst High Tech Dept Control Engn Xian 710025 Peoples R China

ISBN: (纸本)9781538612446

An adaptive dynamic programming(ADP) algorithm that contain online measurement and off-policy learning two phase is proposed to solve the H-infinity control problem of continuous-time nonlinear system with constrained -input and unknown model only based on online data. The model -free Hamiton-Jacobi-Isaacs(HJI) equation is derived by the policy iteration(PI) and the model-free iteration reinforcement learning(IRL) method. Three neural networks(NN) are structured, after collecting online data of system is finished, then off-policy learning method is used to approximate solve the model-free HJI equation. And the value function, control strategy and disturbance strategy are obtained by the NN. The weights of the neural network are solved by the least square method. The simulation results verify the feasibility of the algorithm.

关键词： adaptive dynamic programming H-infinity Control Constrained-Input Neural Network

来源：评论

学校读者我要写书评

暂无评论

Neural Network Tracking Control of Unknown Servo System with Approximate dynamic programming

Neural Network Tracking Control of Unknown Servo System with...

引用

第三十八届中国控制会议

作者： Yongfeng Lv Xuemei Ren Tianyi Zeng Linwei Li Jing Na School of Automation Beijing Institute of Technology IEEE Faculty of Mechanical & Electrical Engineering Kunming University of Science & Technology

Although the adaptive dynamic programming(ADP) scheme has been widely researched on the optimal problem in recent years, which has not been applied to the servo system. In this paper, a simplified reinforcement learning(RL) based(ADP)scheme is developed to obtain the optimal tracking control of the servo system, where the unknown system dynamics are approximated with a three-layer neural network(NN) identifier. First, the servo system model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic NN is then used to learn the optimal cost function, and NN weights are updated by minimizing the squared Hamilton-Jacobi-Bellman(HJB) error. The optimal tracking control of the servomechanism is obtained based on the three-layer NN identifier and RL scheme, which can make the motor speed track the predefined command. Moreover, the convergence of the identifier and NN weights is proved. Finally, a servomechanism model is provided, which can illustrate the proposed methods.

关键词： reinforcement learning adaptive dynamic programming Optimal Control Neural Networks Servomechanisms

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming Based Motion Control of Autonomous Underwater Vehicles 5

Adaptive Dynamic Programming Based Motion Control of Autonom...

引用

5th International Conference on Control, Decision and Information Technologies (CoDIT)

作者： Vibhute, Siddhant VJTI Dept Elect Engn Mumbai Maharashtra India

ISBN: (纸本)9781538650653

In this paper, adaptive dynamic programming (ADP) technique is utilized to achieve optimal motion control of Autonomous Underwater Vehicle (AUV) System. The paper proposes a model-free based method that takes into consideration the actuator input and obstacle position while tracing an optimal path. The concept of machine learning enables to develop a path-planner which aims to avoid collisions with static obstacles. The ADP approach is realized to approximate the solution of the cost functional for optimization purpose by which the positions of the locally situated obstacles need not be priori-known until they are within a designed approximation safety envelope. The methodology is implemented to achieve the path-planning objective using dynamic programming technique. The Least-squares policy method serves as a recursive algorithm to approximate the value function for the domain, providing an approach for the finite space discrete control system. The concept behind the design of an obstacle-free path finder is to generate an optimal action that minimizes the local cost, defined by a functional, under constrained optimization. The most advantageous value function is described by the Hamilton Jacobi Bellman (HJB) equation, that is impractical to solve using analytical methods. To overcome the complex calculations subject to HJB, a method based on reinforcement learning (RL), called ADP is implemented. This paper outlines the concept of machine learning to realize a real time obstacle avoidance system.

关键词： Mathematical model dynamic programming Vehicle dynamics Kinematics Optimization Approximation algorithms Control systems

来源：评论

学校读者我要写书评

暂无评论

adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第6期29卷 2099-2111页

作者： Luo, Biao Liu, Derong Wu, Huai-Ning Chinese Acad Sci State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China Guangdong Univ Technol Sch Automat Guangzhou 510006 Guangdong Peoples R China Beihang Univ Sci & Technol Aircraft Control Lab Beijing 100191 Peoples R China

reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition Q((0))(x, a) >= 0. To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.

关键词： adaptive control adaptive dynamic programming constraints critic-only data-based optimal control Q-learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for Cooperative Control with Incomplete Information

Adaptive Dynamic Programming for Cooperative Control with In...

引用

ieee International Conference on Systems, Man, and Cybernetics (SMC)

作者： Koepf, Florian Ebbert, Sebastian Flad, Michael Hohmann, Soeren Karlsruhe Inst Technol Inst Control Syst IRS Karlsruhe Germany

ISBN: (纸本)9781538666500

There is a trend towards interconnected and complex dynamical systems that are controlled by more than one controller. Due to the coupling of the controllers by means of the system, these interacting controllers need to consider not only the system dynamics but also the influence of each other. However, in realistic scenarios, they usually do not exchange all the information concerning their parameters and control laws and an exact model of the system dynamics is often hard to obtain. This is why we consider the challenging setting where the controllers have no access neither to the parameters of each other nor to the system dynamics. The controller design is quite difficult in this scenario, as the final system configuration is not known during the design process. In this complex scenario, we propose algorithms where each controller uses adaptive dynamic programming to adapt its control law. Here, each controller strives for reaching its individual control objectives, a setting which can be formulated as a coupled optimization problem, respectively a dynamic game. As an example, we consider a vehicle model with two lateral controllers. With our proposed algorithms, the controllers converge successfully to a solution of the coupled optimization problem without knowing the parameters of each other and the system dynamics.

关键词： Cooperative Control adaptive dynamic programming adaptive Optimal Control Game Theory reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning for adaptive Periodic Linear Quadratic Control

Reinforcement Learning for Adaptive Periodic Linear Quadrati...

引用

ieee Annual Conference on Decision and Control

作者： Bo Pang Zhong-Ping Jiang Iven Mareels Control and Networks Lab Department of Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn NY USA IBM Research - Australia Melbourne Vic Australia

ISBN: (数字)9781728113982

ISBN: (纸本)9781728113999

This paper presents a first solution to the problem of adaptive LQR for continuous-time linear periodic systems. Specifically, reinforcement learning and adaptive dynamic programming (ADP) techniques are used to develop two algorithms to obtain near-optimal controllers. Firstly, the policy iteration (PI) and value iteration (VI) methods are proposed when the model is known. Then, PI-based and VI-based off-policy ADP algorithms are derived to find near-optimal solutions directly from input/state data collected along the system trajectories, without the exact knowledge of system dynamics. The effectiveness of the derived algorithms is validated using the well-known lossy Mathieu equation.

关键词： Heuristic algorithms Optimal control Approximation algorithms Mathematical model learning (artificial intelligence) System dynamics Convergence

来源：评论

学校读者我要写书评

暂无评论

Optimal Fault-Tolerant Control for Discrete-Time Nonlinear Strict-Feedback Systems Based on adaptive Critic Design

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第6期29卷 2179-2191页

作者： Wang, Zhanshan Liu, Lei Wu, Yanming Zhang, Huaguang Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Liaoning Peoples R China State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China Liaoning Univ Technol Coll Sci Jinzhou 121001 Peoples R China

This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highlight is the adaptive auxiliary signal of the actuator fault, which is designed to offset the effect of the fault. The considered systems are in strict-feedback forms and involve unknown nonlinear functions, which will result in the causal problem. To solve this problem, the original nonlinear systems are transformed into a novel system by employing the diffeomorphism theory. Besides, the action neural networks (ANNs) are utilized to approximate a predefined unknown function in the backstepping design procedure. Combined the strategic utility function and the ACD technique, a reinforcement learning algorithm is proposed to set up an optimal FTC, in which the critic neural networks (CNNs) provide an approximate structure of the cost function. In this case, it not only guarantees the stability of the systems, but also achieves the optimal control performance as well. In the end, two simulation examples are used to show the effectiveness of the proposed optimal FTC strategy.

关键词： adaptive critic design (ACD) approximate dynamic programming (ADP) neural networks optimal fault-tolerant control (FTC) strict-feedback systems

来源：评论

学校读者我要写书评

暂无评论

Development of reinforcement learning Algorithm for 2-DOF Helicopter Model 27

Development of Reinforcement Learning Algorithm for 2-DOF He...

引用

27th ieee International symposium on Industrial Electronics, ISIE 2018

作者： Fandel, Andrew Birge, Anthony Miah, Suruz Department Bradley University Electrical and Computer Engineering PeoriaIL United States

ISBN: (纸本)9781538637050

This paper examines a reinforcement learning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate actuator commands (input voltages) to the main and tail rotors of a 2-DOF helicopter using the proposed reinforcement learning [herein called the approximate dynamic programming (ADP)] strategy. Furthermore, the proposed strategy has the ability to configure the 2-DOF helicopter to track time-varying reference angles. The proposed ADP technique is capable of dealing with coupling effects between the rigid body structure and propeller dynamics associated with the 2-DOF helicopter model considered in this work. A set of computer simulations is conducted to evaluate the performance of the proposed algorithm. The performance of the proposed algorithm is also compared to that of a conventional linear-quadratic regulator (LQR). © 2018 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Solution with Costate Approximation for a Flexible Wing Aircraft 23

Reinforcement Learning Solution with Costate Approximation f...

引用

ieee International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)

作者： Abouheaf, Mohammed Gueaieb, Wail Univ Ottawa Sch Elect Engn & Comp Sci Ottawa ON Canada

ISBN: (纸本)9781538646182

An online adaptive learning approach based on costate function approximation is developed to solve an optimal control problem in real time. The proposed approach tackles the main concerns associated with the classical Dual Heuristic dynamic programming techniques in uncertain dynamical environments. It employs a policy iteration paradigm along with adaptive critics to implement the adaptive learning solution. The resultant framework does not need or require prior knowledge of the system dynamics, which makes it suitable for systems with high modeling uncertainties. As a proof of concept, the suggested structure is applied for the auto-pilot control of a flexible wing aircraft with unknown dynamics which are continuously varying at each trim speed condition. Numerical simulations showed that the adaptive control technique was able to learn the system's dynamics and regulate its states as desired in a relatively short time.

关键词： Mathematical model Aircraft Optimal control Aerodynamics learning (artificial intelligence) Heuristic algorithms Aerospace control"

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：