检索结果-内蒙古大学图书馆

international Joint Conference on Neural Networks (IJCNN)

作者： Abouheaf, Mohammed I. Lewis, Frank L. Univ Texas Arlington Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781467361293;9781467361286

This paper studies a new class of multi-agent discrete-time dynamical graphical games, where interactions between agents are restricted by a communication graph structure. The paper brings together discrete Hamiltonian mechanics, optimal control theory, cooperative control, game theory, reinforcement learning, and neural network structures to solve the multi-agent dynamical graphical games. Graphical game Bellman equations are derived and shown to be equivalent to certain graphical game Hamilton Jacobi Bellman equations developed herein. reinforcement learning techniques are used to solve these dynamical graphical games. Heuristic dynamic programming and Dual Heuristic programming, are extended to solve the graphical games using only neighborhood information. Online adaptive learning structure is implemented using actor-critic networks to solve these graphical games.

关键词： discrete time systems dynamic programming game theory graph theory learning (artificial intelligence) multi-robot systems optimal control

来源：评论

学校读者我要写书评

暂无评论

Development of reinforcement learning Algorithm for 2-DOF Helicopter Model 27

Development of Reinforcement Learning Algorithm for 2-DOF He...

引用

27th ieee international symposium on Industrial Electronics, ISIE 2018

作者： Fandel, Andrew Birge, Anthony Miah, Suruz Department Bradley University Electrical and Computer Engineering PeoriaIL United States

ISBN: (纸本)9781538637050

This paper examines a reinforcement learning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate actuator commands (input voltages) to the main and tail rotors of a 2-DOF helicopter using the proposed reinforcement learning [herein called the approximate dynamic programming (ADP)] strategy. Furthermore, the proposed strategy has the ability to configure the 2-DOF helicopter to track time-varying reference angles. The proposed ADP technique is capable of dealing with coupling effects between the rigid body structure and propeller dynamics associated with the 2-DOF helicopter model considered in this work. A set of computer simulations is conducted to evaluate the performance of the proposed algorithm. The performance of the proposed algorithm is also compared to that of a conventional linear-quadratic regulator (LQR). © 2018 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming of Continuous Annealing process

Approximate Dynamic Programming of Continuous Annealing proc...

引用

ieee international Conference on Automation and Logistics

作者： Zhang, Yingwei Guo, Chao Chen, Xue Teng, Yongdong Northeastern Univ Minist Educ Key Lab Integrated Automat Proc Ind Shenyang 110004 Liaoning Peoples R China

ISBN: (纸本)9781424447947

approximate dynamic programming method is a combination of neural networks, reinforcement learning, as well as the idea of dynamic programming. It is an online control method which bases on actual data rather than a precise mathematical model of the system. This method is suitable for the optimal control of nonlinear systems, and can avoid the problem of dimension disaster. It can effectively solve the non-linearity of the plant or the uncertainty problem caused by the uncertainty of the system modeling. So, it is suitable for processing the complex system and task of time-varying. The heating section of the continuous annealing furnace consumes a large number of energy, and the dynamic programming method has some limitation for solve the problems. We design the optimization controller for the heating section of the annealing furnace based on the approximate dynamic programming method. In this paper, it mainly gives the basic structure and algorithm of the action-dependent heuristic dynamic programming method (ADHDP), and designs the temperature optimization controller of the heating section in the continuous annealing furnace based on the ADHDP method. Simulation shows the temperature controller based on ADHDP has some theoretical and practical significance for the future practical application.

关键词： approximate dynamic programming Continuous Annealing Furnace ADHDP Neural Network Temperature Control

来源：评论

学校读者我要写书评

暂无评论

Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on reinforcement Neural Network learning

引用

ieee/CAA Journal of Automatica Sinica 2014年第4期1卷 372-384页

作者： Qiming Zhao Hao Xu Sarangapani Jagannathan the DENSO International America Inc. with the College of Science and Engineering Texas A&M University the Department of Electrical&Computer Engineering Missouri University of Science and Technology

In this paper, the output feedback based finitehorizon near optimal regulation of nonlinear affine discretetime systems with unknown system dynamics is considered by using neural networks(NNs) to approximate Hamilton-JacobiBellman(HJB) equation solution. First, a NN-based Luenberger observer is proposed to reconstruct both the system states and the control coefficient matrix. Next, reinforcement learning methodology with actor-critic structure is utilized to approximate the time-varying solution, referred to as the value function, of the HJB equation by using a NN. To properly satisfy the terminal constraint, a new error term is defined and incorporated in the NN update law so that the terminal constraint error is also minimized over time. The NN with constant weights and timedependent activation function is employed to approximate the time-varying value function which is subsequently utilized to generate the finite-horizon near optimal control policy due to NN reconstruction errors. The proposed scheme functions in a forward-in-time manner without offline training phase. Lyapunov analysis is used to investigate the stability of the overall closedloop system. Simulation results are given to show the effectiveness and feasibility of the proposed method.

关键词： Approximation methods Artificial neural networks Feedback learning (artificial intelligence) Nonlinear dynamical systems Observers Optimal control Finite-horizon Hamilton-Jacobi-Bellman equation approximate dynamic programming neural network optimal regulation Optimal control Approximation method Output Feedback Control Feedback Nonlinear dynamical systems Observers System dynamics Artificial neural networks Neural network dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors

Discrete-Time Generalized Policy Iteration ADP Algorithm Wit...

引用

ieee symposium Series on Computational Intelligence (ieee SSCI)

作者： Wei, Qinglai Li, Benkai Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing Peoples R China

ISBN: (纸本)9781538627266

This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neuro-dynamic programming generalized policy iteration nonlinear systems optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Control of a Real Mobile Robot Using approximate Policy Iteration

引用

6th international symposium on Neural Networks

作者： Zhang, Pengchen Xu, Xin Liu, Chunming Yuan, Qiping Natl Univ Def Technol Inst Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642015120

Machine learning for mobile robots has attracted lots of research interests in recent years. However, there are still many challenges to apply learning techniques in real mobile robots, e.g., generalization ill Continuous spaces, learning efficiency and convergence, etc. In this paper, a reinforcement learning path-following control strategy based oil approximate policy iteration (API) is developed for a real mobile robot. It has some advantages such as optimized control policies call be obtained without Much a Priori knowledge oil dynamic models of mobile robot, etc. Two kinds of API-based control method. i.e.. API with linear approximation and API with kernel machines, are implemented ill the path following control task and the efficiency of the proposed control strategy is illustrated in the experimental studies oil the real mobile robot based oil the Pioneer3-AT platform. Experimental results verify that the API-based learning, controller has better convergence and path following accuracy compared to conventional PD control methods. Finally, the learning control performance of the two API methods is also evaluated and compared.

关键词： Mobile robots approximate policy iteration reinforcement learning Path following approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Optimal control applied to Wheeled Mobile Vehicles

Optimal control applied to Wheeled Mobile Vehicles

引用

ieee international symposium on Intelligent Signal Processing

作者： Gomez, M. Martinez, T. Sanchez, S. Meziat, D. Univ Alcala Escuela Politecn Super Dept Automat Alcala De Henares Spain Univ Alicante Escuela Politecn Super Ingn Sistemas Teoria Sefial Dept Fis Alicante Spain

ISBN: (纸本)9781424408290

The goal of the work described in this paper is to develop a particular optimal control technique based on a Cell. Mapping technique in combination with the Q-learning reinforcement learning method to control wheeled mobile vehicles. This approach manages 4 state variables due to a dynamic model is performed instead of a kinematics model which can be done with less variables. This new solution can be applied to non-linear continuous systems where reinforcement learning methods have multiple constraints. Emphasis is given to the new combination of techniques, which applied to optimal control problems produce satisfactory results. The proposed algorithm is very robust to any change involved In the vehicle parameters because the vehicle model is estimated in real time from received experience.

关键词： Cell-Mapping dynamic programming optimal control principle of optimality Q-learning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Stable Iterative Optimal Control for Discrete-Time Nonlinear Systems Using Numerical Controller

Stable Iterative Optimal Control for Discrete-Time Nonlinear...

引用

ieee international Conference on Vehicular Electronics and Safety (ICVES)

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China

ISBN: (纸本)9781479903801

This paper is concerned with a new iterative adaptive dynamic programming (ADP) algorithm to solve optimal control problems for infinite horizon discrete-time nonlinear systems using a numerical controller. The convergence conditions of the iterative ADP are developed considering the errors by the numerical controller which show that the iterative performance index functions can converge to the greatest lower bound of all performance indices within a finite error bound. Neural networks and digital computer are used to approximate the iterative performance index function and compute the numerically iterative control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, a simulation example is given to illustrate the performance of the present method.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming nonlinear systems optimal control neural networks reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A performance gradient perspective on approximate dynamic programming and its application to partially observable markov decision processes

A performance gradient perspective on approximate dynamic pr...

引用

ieee international symposium on Intelligent Control

作者： Dankert, James Yang, Lei Si, Jennie Arizona State Univ Dept Elect Engn Tempe AZ 85287 USA

ISBN: (纸本)9780780397989

This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

High-order local dynamic programming

High-order local dynamic programming

引用

作者： Tassa, Yuval Todorov, Emanuel Interdisciplinary Center for Neural Computation Hebrew University Jerusalem Israel Applied Mathematics and Computer Science and Engineering University of Washington Seattle United States

ISBN: (纸本)9781424498888

We describe a new local dynamic programming algorithm for solving stochastic continuous Optimal Control problems. We use cubature integration to both propagate the state distribution and perform the Bellman backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare the classic quadratic cost-to-go/linear-feedback controller to a cubic cost-to-go/quadratic policy controller on a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction. © 2011 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：