检索结果-内蒙古大学图书馆

Approximate dynamic programming in Tracking Control of a Robotic Manipulator

INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS 2016年第1期13卷 16-16页

作者： Szuster, Marcin Gierlak, Piotr Rzeszow Univ Technol Rzeszow Poland

This article focuses on the implementation of an approximate dynamic programming algorithm in the discrete tracking control system of the three-degrees of freedom Scorbot-ER 4pc robotic manipulator. The controlled system is included in an articulated robots group which uses rotary joints to access their work space. The main part of the control system is a dual heuristic dynamic programming algorithm that consists of two structures designed in the form of neural networks: an actor and a critic. The actor generates the suboptimal control law while the critic approximates the difference of the value function from Bellman's equation with respect to the state. The residual elements of the control system are the PD controller, the supervisory term and an additional control signal. The structure of the supervisory term derives from the stability analysis performed using the Lyapunov stability theorem. The control system works online, the neural networks' weights-adaptation procedure is performed in every iteration step, and the neural networks' preliminary learning process is not required. The performance of the control system was verified by a series of computer simulations and experiments performed using the Scorbot-ER 4pc robotic manipulator.

关键词： Approximate dynamic programming dual heuristic dynamic programming Neural Network Robotic Manipulator Tracking Control

来源：评论

学校读者我要写书评

暂无评论

Reinforcement Learning in Discrete Neural Control of the Underactuated System

Reinforcement Learning in Discrete Neural Control of the Und...

引用

12th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2013

作者： Hendzel, Zenon Burghardt, Andrzej Szuster, Marcin Rzeszow Univ Technol Dept Appl Mech & Robot PL-35959 Rzeszow Poland

ISBN: (纸本)9783642386572

The article presents a new approach to the problem of a discrete neural control of an underactuated system, using reinforcement learning method to an on-line adaptation of a neural network. The controlled system is of the ball and beam type, which is the nonlinear dynamical object with the number of control signals smaller than the number of degrees of freedom. The main part of the neural control system is the actor-critic structure, that comes under the Neural dynamic programming algorithms family, realised in the form of dual heuristic dynamic programming structure. The control system includes moreover the PD controller and the supervisory therm, derived from the Lyapunov stability theorem, that ensures stability. The proposed neural control system works on-line and does not require a preliminary learning. Computer simulations have been conducted to illustrate the performance of the control system.

关键词： Ball and Beam System dual heuristic dynamic programming Neural dynamic programming Neural Network Underactuated System Control Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

Value-Gradient Learning

Value-Gradient Learning

引用

IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)/International Joint Conference on Neural Networks (IJCNN)/IEEE Congress on Evolutionary Computation (IEEE-CEC)/IEEE World Congress on Computational Intelligence (IEEE-WCCI)

作者： Fairbank, Michael Alonso, Eduardo City Univ London Sch Informat Dept Comp London EC1V 0HB England

ISBN: (纸本)9781467314909

We describe an Adaptive dynamic programming algorithm VGL(lambda) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends dual heuristic dynamic programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(lambda). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms dual heuristic dynamic programming and TD(lambda). Experiments for control problems using a neural network and greedy policy are provided.

关键词： Value-Gradient Learning dual heuristic dynamic programming DHP Adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Learning Speed and Ability to Cope Without Exploration between DHP and TD(0)

A Comparison of Learning Speed and Ability to Cope Without E...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Fairbank, Michael Alonso, Eduardo City Univ London Dept Comp Sch Informat London EC1V 0HB England

ISBN: (纸本)9781467314909

This paper demonstrates the principal motivations for dual heuristic dynamic programming (DHP) learning methods for use in Adaptive dynamic programming and Reinforcement Learning, in continuous state spaces: that of automatic local exploration, improved learning speed and the ability to work without stochastic exploration in deterministic environments. In a simple experiment, the learning speed of DHP is shown to be around 1700 times faster than TD(0). DHP solves the problem without any exploration, whereas TD(0) cannot solve it without explicit exploration. DHP requires knowledge of, and differentiability of, the environment's model functions. This paper aims to illustrate the advantages of DHP when these two requirements are satisfied.

关键词： dual heuristic dynamic programming DHP Adaptive dynamic programming Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Learning Speed and Ability to Cope Without Exploration between DHP and TD(0)

A Comparison of Learning Speed and Ability to Cope Without E...

引用

International Joint Conference on Neural Networks

作者： Michael Fairbank Eduardo Alonso Department of Computing School of Informatics City University London

ISBN: (纸本)9781467314886

关键词： dual heuristic dynamic programming DHP Adaptive dynamic programming Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

Value-Gradient Learning

Value-Gradient Learning

引用

International Joint Conference on Neural Networks

作者： Michael Fairbank Eduardo Alonso Department of Computing School of Informatics City University London

ISBN: (纸本)9781467314886

We describe an Adaptive dynamic programming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends dual heuristic dynamic programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms dual heuristic dynamic programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.

关键词： Value-Gradient Learning dual heuristic dynamic programming DHP Adaptive dynamic programming state space dynamic programming Neural network algorithms heuristics Learning critic dynamic programming algorithm motivation Environments dual bootstrapping

来源：评论

学校读者我要写书评

暂无评论

Transformation Invariant On-Line Target Recognition

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2011年第6期22卷 906-918页

作者： Iftekharuddin, Khan M. Univ Memphis Dept Elect & Comp Engn Intelligent Syst & Image Proc Lab Memphis TN 38152 USA

Transformation invariant automatic target recognition (ATR) has been an active research area due to its widespread applications in defense, robotics, medical imaging and geographic scene analysis. The primary goal for this paper is to obtain an on-line ATR system for targets in presence of image transformations, such as rotation, translation, scale and occlusion as well as resolution changes. We investigate biologically inspired adaptive critic design (ACD) neural network (NN) models for on-line learning of such transformations. We further exploit reinforcement learning (RL) in ACD framework to obtain transformation invariant ATR. We exploit two ACD designs, such as heuristic dynamic programming (HDP) and dual heuristic dynamic programming (DHP) to obtain transformation invariant ATR. We obtain extensive statistical evaluations of proposed on-line ATR networks using both simulated image transformations and real benchmark facial image database, UMIST, with pose variations. Our simulations show promising results for learning transformations in simulated images and authenticating out-of plane rotated face images. Comparing the two on-line ATR designs, HDP outperforms DHP in learning capability and robustness and is more tolerant to noise. The computational time involved in HDP is also less than that of DHP. On the other hand, DHP achieves a 100% success rate more frequently than HDP for individual targets, and the residual critic error in DHP is generally smaller than that of HDP. Mathematical analyses of both our RL-based on-line ATR designs are also obtained to provide a sufficient condition for asymptotic convergence in a statistical average sense.

关键词： Active on-line learning automatic target recognition dual heuristic dynamic programming face authentication heuristic dynamic programming image transformation invariance reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：