This article focuses on the implementation of an approximate dynamicprogramming algorithm in the discrete tracking control system of the three-degrees of freedom Scorbot-ER 4pc robotic manipulator. The controlled sys...
详细信息
This article focuses on the implementation of an approximate dynamicprogramming algorithm in the discrete tracking control system of the three-degrees of freedom Scorbot-ER 4pc robotic manipulator. The controlled system is included in an articulated robots group which uses rotary joints to access their work space. The main part of the control system is a dual heuristic dynamic programming algorithm that consists of two structures designed in the form of neural networks: an actor and a critic. The actor generates the suboptimal control law while the critic approximates the difference of the value function from Bellman's equation with respect to the state. The residual elements of the control system are the PD controller, the supervisory term and an additional control signal. The structure of the supervisory term derives from the stability analysis performed using the Lyapunov stability theorem. The control system works online, the neural networks' weights-adaptation procedure is performed in every iteration step, and the neural networks' preliminary learning process is not required. The performance of the control system was verified by a series of computer simulations and experiments performed using the Scorbot-ER 4pc robotic manipulator.
The article presents a new approach to the problem of a discrete neural control of an underactuated system, using reinforcement learning method to an on-line adaptation of a neural network. The controlled system is of...
详细信息
ISBN:
(纸本)9783642386572
The article presents a new approach to the problem of a discrete neural control of an underactuated system, using reinforcement learning method to an on-line adaptation of a neural network. The controlled system is of the ball and beam type, which is the nonlinear dynamical object with the number of control signals smaller than the number of degrees of freedom. The main part of the neural control system is the actor-critic structure, that comes under the Neural dynamicprogramming algorithms family, realised in the form of dual heuristic dynamic programming structure. The control system includes moreover the PD controller and the supervisory therm, derived from the Lyapunov stability theorem, that ensures stability. The proposed neural control system works on-line and does not require a preliminary learning. Computer simulations have been conducted to illustrate the performance of the control system.
We describe an Adaptive dynamicprogramming algorithm VGL(lambda) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends dual Heur...
详细信息
ISBN:
(纸本)9781467314909
We describe an Adaptive dynamicprogramming algorithm VGL(lambda) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends dual heuristic dynamic programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(lambda). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms dual heuristic dynamic programming and TD(lambda). Experiments for control problems using a neural network and greedy policy are provided.
This paper demonstrates the principal motivations for dual heuristic dynamic programming (DHP) learning methods for use in Adaptive dynamicprogramming and Reinforcement Learning, in continuous state spaces: that of a...
详细信息
ISBN:
(纸本)9781467314909
This paper demonstrates the principal motivations for dual heuristic dynamic programming (DHP) learning methods for use in Adaptive dynamicprogramming and Reinforcement Learning, in continuous state spaces: that of automatic local exploration, improved learning speed and the ability to work without stochastic exploration in deterministic environments. In a simple experiment, the learning speed of DHP is shown to be around 1700 times faster than TD(0). DHP solves the problem without any exploration, whereas TD(0) cannot solve it without explicit exploration. DHP requires knowledge of, and differentiability of, the environment's model functions. This paper aims to illustrate the advantages of DHP when these two requirements are satisfied.
This paper demonstrates the principal motivations for dual heuristic dynamic programming (DHP) learning methods for use in Adaptive dynamicprogramming and Reinforcement Learning, in continuous state spaces: that of a...
详细信息
ISBN:
(纸本)9781467314886
This paper demonstrates the principal motivations for dual heuristic dynamic programming (DHP) learning methods for use in Adaptive dynamicprogramming and Reinforcement Learning, in continuous state spaces: that of automatic local exploration, improved learning speed and the ability to work without stochastic exploration in deterministic environments. In a simple experiment, the learning speed of DHP is shown to be around 1700 times faster than TD(0). DHP solves the problem without any exploration, whereas TD(0) cannot solve it without explicit exploration. DHP requires knowledge of, and differentiability of, the environment's model functions. This paper aims to illustrate the advantages of DHP when these two requirements are satisfied.
We describe an Adaptive dynamicprogramming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends dual Heuristi...
详细信息
ISBN:
(纸本)9781467314886
We describe an Adaptive dynamicprogramming algorithm VGL(λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends dual heuristic dynamic programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms dual heuristic dynamic programming and TD(λ). Experiments for control problems using a neural network and greedy policy are provided.
Transformation invariant automatic target recognition (ATR) has been an active research area due to its widespread applications in defense, robotics, medical imaging and geographic scene analysis. The primary goal for...
详细信息
Transformation invariant automatic target recognition (ATR) has been an active research area due to its widespread applications in defense, robotics, medical imaging and geographic scene analysis. The primary goal for this paper is to obtain an on-line ATR system for targets in presence of image transformations, such as rotation, translation, scale and occlusion as well as resolution changes. We investigate biologically inspired adaptive critic design (ACD) neural network (NN) models for on-line learning of such transformations. We further exploit reinforcement learning (RL) in ACD framework to obtain transformation invariant ATR. We exploit two ACD designs, such as heuristicdynamicprogramming (HDP) and dual heuristic dynamic programming (DHP) to obtain transformation invariant ATR. We obtain extensive statistical evaluations of proposed on-line ATR networks using both simulated image transformations and real benchmark facial image database, UMIST, with pose variations. Our simulations show promising results for learning transformations in simulated images and authenticating out-of plane rotated face images. Comparing the two on-line ATR designs, HDP outperforms DHP in learning capability and robustness and is more tolerant to noise. The computational time involved in HDP is also less than that of DHP. On the other hand, DHP achieves a 100% success rate more frequently than HDP for individual targets, and the residual critic error in DHP is generally smaller than that of HDP. Mathematical analyses of both our RL-based on-line ATR designs are also obtained to provide a sufficient condition for asymptotic convergence in a statistical average sense.
暂无评论