检索结果-内蒙古大学图书馆

Optimal Fault-Tolerant Control for Discrete-Time Nonlinear Strict-Feedback Systems Based on adaptive Critic Design

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第6期29卷 2179-2191页

作者： Wang, Zhanshan Liu, Lei Wu, Yanming Zhang, Huaguang Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Liaoning Peoples R China State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China Liaoning Univ Technol Coll Sci Jinzhou 121001 Peoples R China

This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highlight is the adaptive auxiliary signal of the actuator fault, which is designed to offset the effect of the fault. The considered systems are in strict-feedback forms and involve unknown nonlinear functions, which will result in the causal problem. To solve this problem, the original nonlinear systems are transformed into a novel system by employing the diffeomorphism theory. Besides, the action neural networks (ANNs) are utilized to approximate a predefined unknown function in the backstepping design procedure. Combined the strategic utility function and the ACD technique, a reinforcement learning algorithm is proposed to set up an optimal FTC, in which the critic neural networks (CNNs) provide an approximate structure of the cost function. In this case, it not only guarantees the stability of the systems, but also achieves the optimal control performance as well. In the end, two simulation examples are used to show the effectiveness of the proposed optimal FTC strategy.

关键词： adaptive critic design (ACD) approximate dynamic programming (ADP) neural networks optimal fault-tolerant control (FTC) strict-feedback systems

来源：评论

学校读者我要写书评

暂无评论

Development of reinforcement learning Algorithm for 2-DOF Helicopter Model 27

Development of Reinforcement Learning Algorithm for 2-DOF He...

引用

27th ieee International symposium on Industrial Electronics, ISIE 2018

作者： Fandel, Andrew Birge, Anthony Miah, Suruz Department Bradley University Electrical and Computer Engineering PeoriaIL United States

ISBN: (纸本)9781538637050

This paper examines a reinforcement learning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate actuator commands (input voltages) to the main and tail rotors of a 2-DOF helicopter using the proposed reinforcement learning [herein called the approximate dynamic programming (ADP)] strategy. Furthermore, the proposed strategy has the ability to configure the 2-DOF helicopter to track time-varying reference angles. The proposed ADP technique is capable of dealing with coupling effects between the rigid body structure and propeller dynamics associated with the 2-DOF helicopter model considered in this work. A set of computer simulations is conducted to evaluate the performance of the proposed algorithm. The performance of the proposed algorithm is also compared to that of a conventional linear-quadratic regulator (LQR). © 2018 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Solution with Costate Approximation for a Flexible Wing Aircraft 23

Reinforcement Learning Solution with Costate Approximation f...

引用

ieee International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)

作者： Abouheaf, Mohammed Gueaieb, Wail Univ Ottawa Sch Elect Engn & Comp Sci Ottawa ON Canada

ISBN: (纸本)9781538646182

An online adaptive learning approach based on costate function approximation is developed to solve an optimal control problem in real time. The proposed approach tackles the main concerns associated with the classical Dual Heuristic dynamic programming techniques in uncertain dynamical environments. It employs a policy iteration paradigm along with adaptive critics to implement the adaptive learning solution. The resultant framework does not need or require prior knowledge of the system dynamics, which makes it suitable for systems with high modeling uncertainties. As a proof of concept, the suggested structure is applied for the auto-pilot control of a flexible wing aircraft with unknown dynamics which are continuously varying at each trim speed condition. Numerical simulations showed that the adaptive control technique was able to learn the system's dynamics and regulate its states as desired in a relatively short time.

关键词： Mathematical model Aircraft Optimal control Aerodynamics learning (artificial intelligence) Heuristic algorithms Aerospace control"

来源：评论

学校读者我要写书评

暂无评论

Special Issue on Deep reinforcement learning and adaptive dynamic programming

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第6期29卷 2038-2041页

作者： Zhao, Dongbin Liu, Derong Lewis, F. L. Principe, Jose C. Squartini, Stefano Chinese Acad Sci Inst Automat Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Univ Arizona Tucson AZ USA IEEE Computat Intelligence Soc Adapt Dynam Programming & Reinforcement Learning Piscataway NJ USA IEEE Computat Intelligence Soc Multimedia Subcomm Piscataway NJ USA Beijing Chapter Beijing Peoples R China Univ Illinois Elect & Comp Engn & Comp Sci Chicago IL USA Int Neural Network Soc Hoffman Estates IL USA Int Assoc Pattern Recognit Hoffman Estates IL USA Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China Nanjing Univ Sci & Technol Nanjing Jiangsu Peoples R China Northeastern Univ Shenyang Liaoning Peoples R China Natl Acad Inventors Tampa FL USA IFAC Geneva Switzerland PE Texas UK Inst Measurement & Control Austin TX USA Univ Texas Arlington Arlington TX 76019 USA Univ Florida Elect & Comp Engn & Biomed Engn Gainesville FL USA Univ Florida ECE Gainesville FL USA Univ Florida Computat NeuroEngn Lab CNEL Gainesville FL USA Univ Florida Advisory Board Inst Brain Gainesville FL USA IEEE Signal Proc Soc Tech Comm Neural Networks Piscataway NJ USA UnivPM Dept Informat Engn Elect Circuit Theory Ancona Italy UnivPM Ancona Italy

In the first issue of Nature 2015, Google DeepMind published a paper “Human-level control through deep reinforcement learning.” Furthermore, in the first issue of Nature 2016, it published a cover paper “Mastering the game of Go with deep neural networks and tree search” and proposed the computer Go program, AlphaGo. In March 2016, AlphaGo beat the world’s top Go player Lee Sedol by 4:1. This becomes a new milestone in artificial intelligence history, the core of which is the algorithm of deep reinforcement learning (RL).

关键词： Optimal control Task analysis Heuristic algorithms Games Training data Approximation algorithms Nonlinear systems dynamic programming Machine learning Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

PDP: Parallel dynamic programming

引用

ieee/CAA Journal of Automatica Sinica 2017年第1期4卷 1-5页

作者： Fei-Yue Wang Jie Zhang Qinglai Wei Xinhu Zheng Li Li IEEE State Key Laboratory of Management and Control for Complex Systems(SKL-MCCS) Institute of AutomationChinese Academy of Sciences(CASIA) School of Computer and Control Engineering University of Chinese Academy of Sciences Research Center for Military Computational Experiments and Parallel Systems Technology National University of Defense Technology State Key Laboratory of Management and Control for Complex Systems Institute of AutomationChinese Academy of Sciences(SKL-MCCSCASIA) Qingdao Academy of Intelligent Industries Department of Computer Science and Engineering University of Minnesota Department of Automation Tsinghua University

Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming ADP is first presented instead of direct dynamic programming DP , and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence. © 2014 Chinese Association of Automation.

关键词： Artificial intelligence Neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Summary on Some Typical adaptive dynamic programming Schemes 37

A Summary on Some Typical Adaptive Dynamic Programming Schem...

引用

37th Chinese Control Conference (CCC)

作者： Zhao, Qian Mu, Chaoxu Liu, Weiqiang Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Southeast Univ Sch Automat Nanjing 210096 Jiangsu Peoples R China

ISBN: (纸本)9789881563958

This paper sums up four typical schemes of adaptive dynamic programming (ADP). The diagrams are provided and the algorithms of various schemes are described, which is convenient for comparison. Some schemes in this paper belong to the group of action-dependent (AD) adaptive critic designs, which features without a model network in the design. For simplicity of notation, we do not use the prefix AD. The learning process of ADP is accomplished by updating the weights of the networks. The weight updating processes of some networks in GDHP scheme are introduced.

关键词： Neural networks goal representation online learning and control reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Decentralized Control for Large-Scale Nonlinear Systems With Unknown Mismatched Interconnections via Policy Iteration

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2018年第10期48卷 1725-1735页

作者： Zhao, Bo Wang, Ding Shi, Guang Liu, Derong Li, Yuanchun Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Changchun Univ Technol Dept Control Sci & Engn Changchun 130012 Jilin Peoples R China

In this paper, the decentralized control problem is solved based on a policy iteration algorithm for large-scale nonlinear systems with unknown mismatched interconnections. The unknown interconnection is approximated by a neural network with local states of isolated subsystem and substituted reference states of coupled subsystems. Then, an adaptive estimation term is utilized to construct the improved local performance index function that reflects the substitution error. Hereafter, the closed-loop large-scale nonlinear system is guaranteed to be ultimately uniformly bounded by the implementation of a set of developed decentralized optimal control policies. Two simulation examples are given to verify the effectiveness of the presented scheme. The significant contribution of this scheme lies in that it removes the common assumptions on satisfying matching condition and upper boundedness of interconnections, when designing the decentralized optimal control for large-scale nonlinear systems.

关键词： adaptive dynamic programming (ADP) decentralized control large-scale systems neural networks (NNs) optimal control policy iteration (PI) reinforcement learning unknown mismatched interconnections

来源：评论

学校读者我要写书评

暂无评论

Applying Expectation-Maximization Evaluation on Approximate Optimal Control 12

Applying Expectation-Maximization Evaluation on Approximate ...

引用

12th Annual ieee International Systems Conference (SYSCON)

作者： Zhang, Songtao Dubay, Rickey Univ New Brunswick Dept Mech Engn Fredericton NB Canada

ISBN: (纸本)9781538636640

In this paper we proposed an approach of approximating optimal tracking via expectation-maximization (EM) evaluation. From the discussion of applying reinforcement learning (RL) for a system with unknown internal dynamics, we present the challenge of using a classical frame of Q-learning on a tracking task. Further we explained the idea of redefining the cost function (i.e. criterion) of Q-learning to satisfy the requirement for the system dynamic knowledge for the tracking task. We explained the advantages of dividing the original trajectory tracking task into two machine learning subtasks (i.e. learning the quadratic regulator and learning the baseline command generator) on-line. Details are given on the integration of the Q-learning frame and EM algorithm as well as the convergence to the optimum control via iterative estimation of an optimal regulator and a baseline generator. Initial simulation results of this approach using a second order system showed the ability of the Q-learning frame integrated with the EM algorithm approximates to the optimal tracking task.

关键词： Q-learning quadratic control expectation-maximization algorithm adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Longitudinal dynamic versus Kinematic Models for Car-Following Control Using Deep reinforcement learning

Longitudinal Dynamic versus Kinematic Models for Car-Followi...

引用

International Conference on Intelligent Transportation

作者： Yuan Lin John McPhee Nasser L. Azad Systems Design Engineering Department University of Waterloo Ontario Canada

The majority of current studies on autonomous vehicle control via deep reinforcement learning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration command dynamics. The acceleration delay, which results from sensing and actuation delays, results in delayed execution of the control inputs. The acceleration command dynamics dictates that the actual vehicle acceleration does not rise up to the desired command acceleration instantaneously due to dynamics. In this work, we investigate the feasibility of applying DRL controllers trained using vehicle kinematic models to more realistic driving control with vehicle dynamics. We consider a particular longitudinal car-following control, i.e., adaptive Cruise Control (ACC), problem solved via DRL using a point-mass kinematic model. When such a controller is applied to car following with vehicle dynamics, we observe significantly degraded car-following performance. Therefore, we redesign the DRL framework to accommodate the acceleration delay and acceleration command dynamics by adding the delayed control inputs and the actual vehicle acceleration to the reinforcement learning environment state, respectively. The training results show that the redesigned DRL controller results in near-optimal control performance of car following with vehicle dynamics considered when compared with dynamic programming solutions.

关键词： Vehicle dynamics Acceleration learning (artificial intelligence) Delays Kinematics Neural networks Autonomous vehicles

来源：评论

学校读者我要写书评

暂无评论

adaptive Assist-as-needed Control Based on Actor-Critic reinforcement learning

Adaptive Assist-as-needed Control Based on Actor-Critic Rein...

引用

2019 ieee/RSJ International Conference on Intelligent Robots and Systems (IROS)

作者： Yufeng Zhang Shuai Li Karen J. Nolan Damiano Zanotto Wearable Robotics Systems (WRS) Lab. Stevens Institute of Technology Hoboken NJ USA Human Performance and Engineering Research Kessler Foundation West Orange NJ USA

In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controllers require a patient-specific manual tuning of the parameters defining the underlying force-field, which typically results in a tedious and time-consuming process. In this paper, we propose a reinforcement-learning-based impedance controller that actively reshapes the stiffness of the force-field to the subject's performance, while providing assistance only when needed. This adaptability is made possible by correlating the subject's most recent performance to the ultimate control objective in real-time. In addition, the proposed controller is built upon action dependent heuristic dynamic programming using the actor-critic structure, and therefore does not require prior knowledge of the system model. The controller is experimentally validated with healthy subjects through a simulated ankle mobilization training session using a powered ankle-foot orthosis.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：