Using the neural-network-based iterative adaptivedynamicprogramming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is prop...
详细信息
ISBN:
(纸本)9783642210891
Using the neural-network-based iterative adaptivedynamicprogramming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is proposed in this paper. The optimal controller is designed with convergence analysis in terms of cost function and control law. In order to implement the algorithm via globalized dual heuristic programming (CDHP) technique, a neural network is constructed first to identify the unknown nonlinear system, and then two other neural networks are used to approximate the cost function and the control law, respectively. An example is provided to verify the effectiveness of the present approach.
Dual Heuristic programming (DHP) is a class of approximate dynamicprogramming methods using neural networks. Although there have been some successful applications of DHP, its performance and convergence are greatly i...
详细信息
ISBN:
(纸本)9783642211102
Dual Heuristic programming (DHP) is a class of approximate dynamicprogramming methods using neural networks. Although there have been some successful applications of DHP, its performance and convergence are greatly influenced by the design of the step sizes in the critic module as well as the actor module. In this paper, a Delta-Bar-Delta learning rule is proposed for the DHP algorithm, which helps the two modules adjust learning rate individually and adaptively. Finally, the feasibility and effectiveness of the proposed method are illustrated in the learning control task of an inverted pendulum.
The Xpilot-AI video game platform allows the creation of artificially intelligent and autonomous control agents. At the same time, the Xpilot environment is highly complex, with very many state variables and action ch...
详细信息
ISBN:
(纸本)9781424478354
The Xpilot-AI video game platform allows the creation of artificially intelligent and autonomous control agents. At the same time, the Xpilot environment is highly complex, with very many state variables and action choices. Basic reinforcementlearning (RL) techniques are somewhat limited in their application when dealing with such large state-and action-spaces, since the repetition of exposure that is key to their value updates can proceed very slowly. To solve this problem, state-abstractions are often generated, allowing learning to move more quickly, but often requiring the programmer to hand-craft state representations, reward functions, and action choices in an ad hoc manner. We apply an automated technique for generating useful abstractions for learning, adaptive Kanerva coding. This method employs a small sub-set of the original states as a proxy for the full environment, updating values over the abstract representative prototype states in a manner analogous to Q-learning. Over time, the set of prototypes is adjusted to provide more effective coverage and abstraction, again automatically. Our results show that this technique allows a simple learning agent to double its survival time when navigating the Xpilot environment, using only a small fraction of the full state-space as a stand-in and greatly increasing the potential for more rapid learning.
Wireless sensor networks are composed of small nodes with limited battery life and computational ability. Energy reduction in these networks is an important issue to extend network lifetime. dynamic power management i...
详细信息
In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcementlearning (RL), e. g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI alg...
详细信息
In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcementlearning (RL), e. g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large or continuous state spaces. To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, which can be formulated as time-optimal learning control tasks. In the proposed method, after collecting samples adaptively in the state space of the original MDP, a learning-based decomposition strategy of sample sets was designed to implement the binary-tree state space decomposition process. Then, API algorithms were used on the sample subsets to approximate local optimal policies of sub-MDPs. The original MDP was decomposed into a binary-tree structure of absorbing sub-MDPs, constructed during the learning process, thus, local near-optimal policies were approximated by API algorithms with reduced complexity and higher precision. Furthermore, because of the improved quality of local policies, the combined global policy performed better than the near-optimal policy obtained by a single API algorithm in the original MDP. Three learning control problems, including path-tracking control of a real mobile robot, were studied to evaluate the performance of the HAPI method. With the same setting for basis function selection and sample collection, the proposed HAPI obtained better near-optimal policies than previous API methods such as LSPI and KLSPI.
A common technique for dealing with the curse of dimensionality in approximate dynamicprogramming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear com...
详细信息
A common technique for dealing with the curse of dimensionality in approximate dynamicprogramming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem.
In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptivedynamicprogramming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consid...
详细信息
In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptivedynamicprogramming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach.
reinforcementlearning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly ...
详细信息
reinforcementlearning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly stable systems, ineffective parameterizations can result in unstable controllers and poor performance both in terms of learning convergence and in the cost of the resulting policy. In this paper we explore four linear controller parameterizations in the context of REINFORCE, applying them to the control of a reaching task with a linearized flexible manipulator. We find that some natural but naive parameterizations perform very poorly, while the Youla Parameterization (a popular parameterization from the controls literature) offers a number of robustness and performance advantages.
An intelligent optimal control scheme for unknown nonlinear discrete-time systems with discount factor in the cost function is proposed in this paper. An iterative adaptivedynamicprogramming (ADP) algorithm via glob...
详细信息
An intelligent optimal control scheme for unknown nonlinear discrete-time systems with discount factor in the cost function is proposed in this paper. An iterative adaptivedynamicprogramming (ADP) algorithm via globalized dual heuristic programming (GDHP) technique is developed to obtain the optimal controller with convergence analysis. Three neural networks are used as parametric structures to facilitate the implementation of the iterative algorithm, which will approximate at each iteration the cost function, the optimal control law, and the unknown nonlinear system, respectively. Two simulation examples are provided to verify the effectiveness of the presented optimal control approach.
This paper proposes a supervised adaptivedynamicprogramming (SADP) algorithm for the full range adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and ...
详细信息
This paper proposes a supervised adaptivedynamicprogramming (SADP) algorithm for the full range adaptive cruise control (ACC) system. The full range ACC system considers both the ACC situation in highway system and the stop and go (SG) situation in urban street way system. It can autonomously drive the host vehicle with desired speed and distance to the preceding vehicle in both situations. A traditional adaptivedynamicprogramming (ADP) algorithm is suited for this problem, but it suffers from the low learning efficiency. We propose the concept of inducing range to construct the supervisor and finally formulate the SADP algorithm, which greatly speeds up the learning efficiency. Several driving scenarios are designed and tested with the trained controller compared to traditional ones by simulation results, showing that trained SADP performs very well in all the scenarios, so that it provides an effective approach for the full range ACC problem.
暂无评论