Dynamic programming methods are capable of solving reinforcement learning problems, in which an agent must improve its behavior through trial-and-error interactions with a dynamic environment. However, these computati...
详细信息
Dynamic programming methods are capable of solving reinforcement learning problems, in which an agent must improve its behavior through trial-and-error interactions with a dynamic environment. However, these computational algorithms suffer from the curse of dimensionality (Bellman, 1957) that the number of computational operations increases exponentially with the cardinality of the state space. In practice, this usually results in a very long training time and applications in continuous domain are far from trivial. In order to ease this problem, we propose the use of vector quantization to adaptively partition the state space based on the recent estimate of the action-value function. In particular, this state-space partitioning operation is performed incrementally to reflect the experience accumulated by the agent as it explores the underlying environment.
This paper presents a fuzzy logic controller (FLC) for the implementation of some behaviour of Sony legged robots. The adaptive heuristic Critic (AHC) reinforcement learning is employed to refine the FLC. The actor pa...
详细信息
This paper presents a fuzzy logic controller (FLC) for the implementation of some behaviour of Sony legged robots. The adaptive heuristic Critic (AHC) reinforcement learning is employed to refine the FLC. The actor part of AHC is a conventional FLC in which the parameters of input membership functions are learned by an immediate internal reinforcement signal. This internal reinforcement signal comes from a prediction of the evaluation value of a policy and the external reinforcement signal. The evaluation value of a policy is learned by temporal difference (TD) learning in the critic part that is also represented by a FLC. A genetic algorithm (GA) is employed for learning internal reinforcement of the actor part because it is more efficient in searching than other trial and error search approaches.
This paper addresses the problem of training trajectories by means of continuous recurrent neural networks whose feedforward parts are multilayer perceptrons. Such networks can approximate a general nonlinear dynamic ...
详细信息
This paper addresses the problem of training trajectories by means of continuous recurrent neural networks whose feedforward parts are multilayer perceptrons. Such networks can approximate a general nonlinear dynamic system with arbitrary accuracy. The learning process is transformed into an optimal control framework where the weights are the controls to be determined. A training algorithm based upon a variational formulation of Pontryagin's maximum principle is proposed for such networks. Computer examples demonstrating the efficiency of the given approach are also presented.
A longstanding goal in chemical physics has been the control of atoms and molecules using coherent light fields. This paper provides a brief overview of the field and discusses experiments that use a programmable puls...
详细信息
A longstanding goal in chemical physics has been the control of atoms and molecules using coherent light fields. This paper provides a brief overview of the field and discusses experiments that use a programmable pulse shaper to control the quantum state of electronic wavepackets in Rydberg atoms and electronic and nuclear dynamics in molecular liquids. The shape of Rydberg wavepackets was controlled by using tailored ultrafast pulses to excite a beam of caesium atoms. The quantum state of these atoms was measured using holographic techniques borrowed from optics. The experiments with molecular liquids involved the construction of an automated learning machine. A genetic algorithm directed the choice of shaped pulses which interacted with the molecular system inside a learning control loop. Analysis of successful pulse shapes that were found by using the genetic algorithm yield insight into the systems being controlled.
This paper attempts to establish a theory for a general auto-associative memory model. We start by defining a new concept called supporting function to replace the concept of energy function. As known, the energy func...
详细信息
This paper attempts to establish a theory for a general auto-associative memory model. We start by defining a new concept called supporting function to replace the concept of energy function. As known, the energy function relies on the assumption of symmetric interconnection weights, which is used in the conventional Hopfield auto-associative memory, but not evidenced in any biological memories. We then formulate the information retrieving process as a dynamic system by making use of the supporting function and derive the attraction or asymptotic stability condition and the condition for convergence of an arbitrary state to a desired state. The latter represents a key condition for associative memory to have a capability of learning from variant samples. Finally, we develop an algorithm to learn the asymptotic stability condition and an algorithm to train the system to recover desired states from their variant samples. The latter called sample learning algorithm is the first of its kind ever been discovered for associative memories. Both recalling and learning processes are of finite convergence, a must-have feature for associative memories by analogy to normal human memory. The effectiveness of the recalling and learning algorithms is experimentally demonstrated.
A learning algorithm for radial basis function support vector machines (RBF-SVMs) that can be easily implemented in digital VLSI is proposed. It is shown that, as opposed to traditional artificial neural networks, lea...
详细信息
A learning algorithm for radial basis function support vector machines (RBF-SVMs) that can be easily implemented in digital VLSI is proposed. It is shown that, as opposed to traditional artificial neural networks, learning in SVMs is very robust with respect to quantisation effects deriving from the finite precision of computations.
The paper deals with the problem of fault tolerance in a multilayer perceptron network. Although it already possesses a reasonable fault tolerance capability, it may be insufficient in particularly critical applicatio...
详细信息
The paper deals with the problem of fault tolerance in a multilayer perceptron network. Although it already possesses a reasonable fault tolerance capability, it may be insufficient in particularly critical applications. Studies carried out by the authors have shown that the traditional backpropagation learning algorithm may entail the presence of a certain number of weights with a much higher absolute value thin the others. Further studies have shown that faults in these weights is the main cause of deterioration in the performance of the neural network. In other words, the main cause of incorrect network functioning on the occurrence of a fault is the non-uniform distribution of absolute values of weights in each layer. The paper proposes a learning algorithm which updates the weights, distributing their absolute values as uniformly as possible in each layer. Tests performed on benchmark test sets have shown the considerable increase in fault tolerance obtainable with the proposed approach as compared with the traditional backpropagation algorithm and with some of the most efficient fault tolerance approaches to be found in literature. (C) 1999 Elsevier Science Ltd. All rights: reserved.
In this paper, the effects of basic parameters in reinforcement learning control such as eligibility, action and critic network constrained weights, system nonlinearities, gradient information, state-space partitionin...
详细信息
In this paper, the effects of basic parameters in reinforcement learning control such as eligibility, action and critic network constrained weights, system nonlinearities, gradient information, state-space partitioning, variance of exploration are studied in detail. It is attempted to increase feasibility for practical applications, implementation, learning efficiency, and enhance performance. Also, a novel adaptive grid algorithm is proposed to overcome the difficulty in partitioning the input space to achieve better performance. Reinforcement learning is applied for control of a nonlinear one and two-link robots. This problem dictates that the learning is performed on-line, based on a binary or real-valued reinforcement signal from a critic network, without knowing the system model or nonlinearity. (C) 2002 Elsevier Science Ltd. All rights reserved.
This paper proposes and studies an algorithm for task-level control based on a radial. basis function network approximation of the optimal task input vector on parameters of the task. A learning update scheme is propo...
详细信息
This paper proposes and studies an algorithm for task-level control based on a radial. basis function network approximation of the optimal task input vector on parameters of the task. A learning update scheme is proposed for on-line compensation for the inaccuracy of the model used in the controller design. The update approximates the Jacobian of the task input-output mapping using an off-line design model. Deadzone convergence of this learning scheme in the presence of modeling errors is proved and constructive estimates of the convergence robustness parameters are obtained. An application of the proposed algorithm to Feedforward vibration compensation for flexible spacecraft slewing complements the theoretical analysis. Simulations demonstrate practically acceptable performance of the algorithms in this difficult problem. (C) 2001 Elsevier Science Ltd. All rights reserved.
暂无评论