检索结果-内蒙古大学图书馆

Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown dynamics Using reinforcement learning Method

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2017年第5期64卷 4091-4100页

作者： Zhang, Huaguang Jiang, He Luo, Yanhong Xiao, Geyang Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Peoples R China

This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcement learning method. It is known that the optimal consensus control for multi-agent systems relies on the solution of the coupled Hamilton-Jacobi-Bellman equation, which is generally impossible to be solved analytically. Even worse, most real-world systems are too complicated to obtain accurate mathematical models. To overcome these deficiencies, a data-based adaptive dynamic programming method is presented using the current and past system data rather than the accurate system models also instead of the traditional identification scheme which would cause the approximation residual errors. First, we establish a discounted performance index and formulate the optimal consensus problem via Bellman optimality principle. Then, we introduce the policy iteration algorithm which motivates this paper. To implement the proposed online action-dependent heuristic dynamic programming method, two neural networks (NNs), 1) critic NN and 2) actor NN, are employed to approximate the iterative performance index functions and control policies, respectively, in real time. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed method.

关键词： Action-dependent heuristic dynamic programming (ADHDP) adaptive dynamic programming (ADP) data-driven multi-agent systems optimal consensus control reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

15th International symposium on Neural Networks, ISNN 2018

15th International Symposium on Neural Networks, ISNN 2018

引用

15th International symposium on Neural Networks, ISNN 2018

ISBN: (纸本)9783319925363

The proceedings contain 97 papers. The special focus in this conference is on Neural Networks. The topics include: Development of a sensory-neural network for medical diagnosing;review of pseudoinverse learning algorithm for multilayer neural networks and applications;identification of vessel kinetics based on neural networks via concurrent learning;method to improve the performance of restricted boltzmann machines;modeling hysteresis using non-smooth neural networks;The implementation of a pointer network model for traveling salesman problem on a xilinx PYNQ board;generalized affine scaling trajectory analysis for linearly constrained convex programming;Drift compensation for E-nose using QPSO-based domain adaptation kernel ELM;convergence analysis of self-adaptive immune particle swarm optimization algorithm;a neurodynamic approach to multiobjective linear programming;an improved artificial fish swarm algorithm to solve the cutting stock problem;a hyper heuristic algorithm for low carbon location routing problem;pulse neuron supervised learning rules for adapting the dynamics of synaptic connections;an artificial neural network for solving quadratic zero-one programming problems;A new parameter identification method for type-1 TS fuzzy neural network;performance enhancement of deep reinforcement learning networks using feature extraction;online grnn-based ensembles for regression on evolving data streams;a broad neural network structure for class incremental learning;weibocluster: An event-oriented sina weibo dataset with estimating credit;robust neural networks learning: New approaches;neural network model of unconscious;data cleaning and classification in the presence of label noise with class-specific autoencoder;using the wide and deep flexible neural tree to forecast the exchange rate;recurrent neural network with dynamic memory.

关键词：

来源：评论

学校读者我要写书评

暂无评论

adaptive Critic Nonlinear Robust Control: A Survey

引用

ieee TRANSACTIONS ON CYBERNETICS 2017年第10期47卷 3429-3451页

作者： Wang, Ding He, Haibo Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Comp & Control Engn Beijing 100049 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Guangdong Univ Technol Sch Automat Guangzhou 510006 Guangdong Peoples R China

adaptive dynamic programming (ADP) and reinforcement learning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H-infinity control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.

关键词： adaptive critic designs adaptive/approximate dynamic programming (ADP) boundedness convergence neural networks optimal control reinforcement learning robust control stability

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors

Discrete-Time Generalized Policy Iteration ADP Algorithm Wit...

引用

ieee symposium Series on Computational Intelligence (ieee SSCI)

作者： Wei, Qinglai Li, Benkai Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing Peoples R China

ISBN: (纸本)9781538627266

This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming generalized policy iteration nonlinear systems optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based adaptive dynamic programming Algorithms

引用

ieee TRANSACTIONS ON CYBERNETICS 2017年第10期47卷 3331-3340页

作者： Zhang, Huaguang Jiang, He Luo, Chaomin Xiao, Geyang Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China Univ Detroit Mercy Dept Elect & Comp Engn Detroit MI 48221 USA

In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptive dynamic programming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcement learning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.

关键词： adaptive dynamic programming (ADP) neural networks (NNs) nonzero-sum games policy iteration (PI)

来源：评论

学校读者我要写书评

暂无评论

Off-Policy Integral reinforcement learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2017年第3期28卷 704-713页

作者： Song, Ruizhuo Lewis, Frank L. Wei, Qinglai Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Univ Texas Arlington UTA Res Inst Arlington TX 76019 USA Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming integral reinforcement learning (IRL) nonlinear systems nonzero sum (NZS) off-policy

来源：评论

学校读者我要写书评

暂无评论

Towards Enabling Deep learning Techniques for adaptive dynamic programming

Towards Enabling Deep Learning Techniques for Adaptive Dynam...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Ni, Zhen Malla, Naresh Zhong, Xiangnan South Dakota State Univ Elect Engn & Comp Sci Dept Brookings SD 57007 USA Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

ISBN: (纸本)9781509061822

Human-level control through deep learning and deep reinforcement learning have revealed the unique and powerful potentials through a very complex Go game. The AlphaGo, developed by Google DeepMind, has beat the top Go game player early this year. The scientific and technological advancement behind the success of AlphaGo attracted researchers from multiple areas, including machine learning, artificial intelligence, computational intelligence and so on. adaptive dynamic programming (ADP) methods have the similar fundamental principle with reinforcement learning, and show strong performance for continuous time and continuous state systems. Deep learning techniques are also possible to be integrated for ADP designs. In this paper, we discuss the key techniques and components in deep reinforcement learning and then present the successful applications for computer games and maze navigation. Future opportunities for deep learning enabled ADP will be discussed at the end.

关键词： Deep learning deep reinforcement learning (DRL) adaptive dynamic programming (ADP) experience replay computational intelligence Markov decision process

来源：评论

学校读者我要写书评

暂无评论

Development of reinforcement learning Algorithm for 2-DOF Helicopter Model

Development of Reinforcement Learning Algorithm for 2-DOF He...

引用

ieee International symposium on Industrial Electronics (ISIE)

作者： Andrew Fandel Anthony Birge Suruz Miah Electrical and Computer Engineering Department Bradley University Peoria Illinois USA

This paper examines a reinforcement learning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate actuator commands (input voltages) to the main and tail rotors of a 2-DOF helicopter using the proposed reinforcement learning [herein called the approximate dynamic programming (ADP)] strategy. Furthermore, the proposed strategy has the ability to configure the 2-DOF helicopter to track time-varying reference angles. The proposed ADP technique is capable of dealing with coupling effects between the rigid body structure and propeller dynamics associated with the 2-DOF helicopter model considered in this work. A set of computer simulations is conducted to evaluate the performance of the proposed algorithm. The performance of the proposed algorithm is also compared to that of a conventional linear-quadratic regulator (LQR).

关键词： Helicopters learning (artificial intelligence) Neural networks Mathematical model Approximation algorithms Rotors Adaptation models

来源：评论

学校读者我要写书评

暂无评论

Manifold-Based reinforcement learning via Locally Linear Reconstruction

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2017年第4期28卷 934-947页

作者： Xu, Xin Huang, Zhenhua Zuo, Lei He, Haibo Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

Feature representation is critical not only for pattern recognition tasks but also for reinforcement learning (RL) methods to solve learning control problems under uncertainties. In this paper, a manifold-based RL approach using the principle of locally linear reconstruction (LLR) is proposed for Markov decision processes with large or continuous state spaces. In the proposed approach, an LLR-based feature learning scheme is developed for value function approximation in RL, where a set of smooth feature vectors is generated by preserving the local approximation properties of neighboring points in the original state space. By using the proposed feature learning scheme, an LLR-based approximate policy iteration (API) algorithm is designed for learning control problems with large or continuous state spaces. The relationship between the value approximation error of a new data point and the estimated values of its nearest neighbors is analyzed. In order to compare different feature representation and learning approaches for RL, a comprehensive simulation and experimental study was conducted on three benchmark learning control problems. It is illustrated that under a wide range of parameter settings, the LLR-based API algorithm can obtain better learning control performance than the previous API methods with different feature representation schemes.

关键词： adaptive dynamic programming learning control manifold learning Markov decision processes (MDPs) reinforcement learning (RL) value function approximation (VFA)

来源：评论

学校读者我要写书评

暂无评论

Hamiltonian-Driven adaptive dynamic programming Based on Extreme learning Machine 14th

引用

14th International symposium on Neural Networks (ISNN)

作者： Yang, Yongliang Wunsch, Donald Guo, Zhishan Yin, Yixin Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9783319590721;9783319590714

In this paper, a novel frame work of reinforcement learning for continuous time dynamical system is presented based on the Hamiltonian functional and extreme learning machine. The idea of solution search in the optimization is introduced to find the optimal control policy in the optimal control problem. The optimal control search consists of three steps: evaluation, comparison and improvement of arbitrary admissible policy. The Hamiltonian functional plays an important role in the above framework, under which only one critic is required in the adaptive critic structure. The critic network is implemented by the extreme learning machine. Finally, simulation study is conducted to verify the effectiveness of the presented algorithm.

关键词： reinforcement learning adaptive dynamic programming Extreme learning machine Hamiltonian functional Optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：