检索结果-内蒙古大学图书馆

作者： Tassa, Yuval Todorov, Emanuel Interdisciplinary Center for Neural Computation Hebrew University Jerusalem Israel Applied Mathematics and Computer Science and Engineering University of Washington Seattle United States

ISBN: (纸本)9781424498888

We describe a new local dynamic programming algorithm for solving stochastic continuous Optimal Control problems. We use cubature integration to both propagate the state distribution and perform the Bellman backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare the classic quadratic cost-to-go/linear-feedback controller to a cubic cost-to-go/quadratic policy controller on a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction. © 2011 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Online adaptive learning of optimal control solutions using integral reinforcement learning

Online adaptive learning of optimal control solutions using ...

引用

作者： Vamvoudakis, Kyriakos G. Vrabie, Draguna Lewis, Frank L. Automation and Robotics Research Institute University of Texas at Arlington Fort Worth TX 76118 United States

ISBN: (纸本)9781424498888

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system's drift dynamics. The adaptive algorithm is based on policy iteration, and it is implemented on an actor/critic structure. Both actor and critic neural networks are adapted simultaneously a persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. © 2011 ieee.

关键词： adaptive algorithms

来源：评论

学校读者我要写书评

暂无评论

A Neural Architecture to Address reinforcement learning Problems

A Neural Architecture to Address Reinforcement Learning Prob...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： de Arruda, Rodrigo L. S. Von Zuben, Fernando J. Univ Campinas UNICAMP Sch Elect & Comp Engn FEEC Dept Comp Engn & Ind Automat DCA Lab Bioinformat & Bioinspired Comp LBiC Campinas SP Brazil

ISBN: (纸本)9781424496365

In this paper, the reinforcement learning problem is formulated equivalently to a Markov Decision Process. We address the solution of such problem using a novel adaptive dynamic programming algorithm which is based on a Multi-layer Perceptron Neural Network composed of a parameterized function approximator called Wire-Fitting. Extending such established model, this work makes use of concepts of eligibility to conceive faster learning algorithms. The advantage of the proposed approach is founded on the capability to handle continuous environments and to learn a better policy while following another. Simulation results involving the automatic control of an inverted pendulum are presented to indicate the effectiveness of the proposed algorithm.

关键词： Approximation methods dynamic programming Equations Heuristic algorithms Markov decision process Markov processes Markov processes Mathematical model Monte Carlo methods adaptive dynamic programming automatic control dynamic programming inverted pendulum learning (artificial intelligence) learning algorithm multilayer perceptron neural network multilayer perceptrons neural architecture neural net architecture parameterized function approximator reinforcement learning wire-fitting

来源：评论

学校读者我要写书评

暂无评论

Optimal Control for a Class of Unknown Nonlinear Systems via the Iterative GDHP Algorithm

Optimal Control for a Class of Unknown Nonlinear Systems via...

引用

8th International symposium on Neural Networks

作者： Wang, Ding Liu, Derong Chinese Acad Sci Inst Automat Beijing 100190 Peoples R China

ISBN: (纸本)9783642210891

Using the neural-network-based iterative adaptive dynamic programming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is proposed in this paper. The optimal controller is designed with convergence analysis in terms of cost function and control law. In order to implement the algorithm via globalized dual heuristic programming (CDHP) technique, a neural network is constructed first to identify the unknown nonlinear system, and then two other neural networks are used to approximate the cost function and the control law, respectively. An example is provided to verify the effectiveness of the present approach.

关键词： adaptive critic designs adaptive dynamic programming approximate dynamic programming intelligent control neural networks optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive Dual Heuristic programming Based on Delta-Bar-Delta learning Rule

Adaptive Dual Heuristic Programming Based on Delta-Bar-Delta...

引用

8th International symposium on Neural Networks

作者： Wu, Jun Xu, Xin Lian, Chuanqiang Huang, Yan Natl Univ Def Technol Coll Mechatron & Automat Inst Automat Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783642211102

Dual Heuristic programming (DHP) is a class of approximate dynamic programming methods using neural networks. Although there have been some successful applications of DHP, its performance and convergence are greatly influenced by the design of the step sizes in the critic module as well as the actor module. In this paper, a Delta-Bar-Delta learning rule is proposed for the DHP algorithm, which helps the two modules adjust learning rate individually and adaptively. Finally, the feasibility and effectiveness of the proposed method are illustrated in the learning control task of an inverted pendulum.

关键词： reinforcement learning adaptive critic design dual heuristic programming Delta-Bar-Delta neural networks

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning with adaptive Kanerva Coding for Xpilot Game AI

Reinforcement Learning with Adaptive Kanerva Coding for Xpil...

引用

ieee Congress on Evolutionary Computation (CEC)

作者： Allen, Martin Fritzsche, Phil Univ Wisconsin Dept Comp Sci La Crosse WI 54601 USA Coll New London Comp Sci Dept Connecticut New London CT USA

ISBN: (纸本)9781424478354

The Xpilot-AI video game platform allows the creation of artificially intelligent and autonomous control agents. At the same time, the Xpilot environment is highly complex, with very many state variables and action choices. Basic reinforcement learning (RL) techniques are somewhat limited in their application when dealing with such large state-and action-spaces, since the repetition of exposure that is key to their value updates can proceed very slowly. To solve this problem, state-abstractions are often generated, allowing learning to move more quickly, but often requiring the programmer to hand-craft state representations, reward functions, and action choices in an ad hoc manner. We apply an automated technique for generating useful abstractions for learning, adaptive Kanerva coding. This method employs a small sub-set of the original states as a proxy for the full environment, updating values over the abstract representative prototype states in a manner analogous to Q-learning. Over time, the set of prototypes is adjusted to provide more effective coverage and abstraction, again automatically. Our results show that this technique allows a simple learning agent to double its survival time when navigating the Xpilot environment, using only a small fraction of the full state-space as a stand-in and greatly increasing the potential for more rapid learning.

关键词： Autonomous agents reinforcement learning dynamic programming real time systems

来源：评论

学校读者我要写书评

暂无评论

A new approach for power management in sensor node based on reinforcement learning

A new approach for power management in sensor node based on ...

引用

International symposium on Computer Networks and Distributed Systems

作者： Kianpisheh, Somayeh Charkari, Nasrolah Moghadam Faculty of Electrical and Computer Engineering Tarbiat Modares University Tehran Iran

ISBN: (纸本)9781424491544

Wireless sensor networks are composed of small nodes with limited battery life and computational ability. Energy reduction in these networks is an important issue to extend network lifetime. dynamic power management is a technique to conserve energy. DPM [1] uses dynamic programming to manage power in sensor nodes. This approach is model based and exploiting it in a multi hop scenario is difficult. In this paper, we propose RLPM which is based on reinforcement learning. It is model free and easily applicable in both single hop and multi hop scenario. Experiments show that RLPM behaves similar to DPM while it does not have those constraints of DPM. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Approximate Policy Iteration with Binary-Tree State Space Decomposition

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2011年第12期22卷 1863-1877页

作者： Xu, Xin Liu, Chunming Yang, Simon X. Hu, Dewen Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China Univ Guelph Sch Engn Guelph ON N1G 2W1 Canada

In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e. g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large or continuous state spaces. To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, which can be formulated as time-optimal learning control tasks. In the proposed method, after collecting samples adaptively in the state space of the original MDP, a learning-based decomposition strategy of sample sets was designed to implement the binary-tree state space decomposition process. Then, API algorithms were used on the sample subsets to approximate local optimal policies of sub-MDPs. The original MDP was decomposed into a binary-tree structure of absorbing sub-MDPs, constructed during the learning process, thus, local near-optimal policies were approximated by API algorithms with reduced complexity and higher precision. Furthermore, because of the improved quality of local policies, the combined global policy performed better than the near-optimal policy obtained by a single API algorithm in the original MDP. Three learning control problems, including path-tracking control of a real mobile robot, were studied to evaluate the performance of the HAPI method. With the same setting for basis function selection and sample collection, the proposed HAPI obtained better near-optimal policies than previous API methods such as LSPI and KLSPI.

关键词： adaptive dynamic programming approximate policy iteration binary-tree hierarchical reinforcement learning Markov decision processes time-optimal control

来源：评论

学校读者我要写书评

暂无评论

Bayesian active learning with basis functions

Bayesian active learning with basis functions

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton NJ USA

A common technique for dealing with the curse of dimensionality in approximate dynamic programming is to use a parametric value function approximation, where the value of being in a state is assumed to be a linear combination of basis functions. Even with this simplification, we face the exploration/exploitation dilemma: an inaccurate approximation may lead to poor decisions, making it necessary to sometimes explore actions that appear to be suboptimal. We propose a Bayesian strategy for active learning with basis functions, based on the knowledge gradient concept from the optimal learning literature. The new method performs well in numerical experiments conducted on an energy storage problem.

关键词： Tin Function approximation Bayesian methods Covariance matrix Mathematical model dynamic programming

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming with balanced weights seeking strategy

Adaptive dynamic programming with balanced weights seeking s...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Jian Fu Haibo He Zhen Ni School of Automation Wuhan University of Technology Wuhan Hubei China Department of Electrical Computer and Biomedical Engineering University of Rhode Island Kingston RI USA

In this paper we propose to integrate the recursive Levenberg-Marquardt method into the adaptive dynamic programming (ADP) design for improved learning and adaptive control performance. Our key motivation is to consider a balanced weight updating strategy with the consideration of both robustness and convergence during the online learning process. Specifically, a modified recursive Levenberg-Marquardt (LM) method is integrated into both the action network and critic network of the ADP design, and a detailed learning algorithm is proposed to implement this approach. We test the performance of our approach based on the triple link inverted pendulum, a popular benchmark in the community, to demonstrate online learning and control strategy. Experimental results and comparative study under different noise conditions demonstrate the effectiveness of this approach.

关键词： Artificial neural networks Equations Jacobian matrices Convergence Algorithm design and analysis Damping Robustness

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：