检索结果-内蒙古大学图书馆

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Timmer, Stephan Riedmiller, Martin Univ Osnabruck Dept Comp Sci D-4500 Osnabruck Germany

ISBN: (纸本)9781424407064

A major issue in model-free reinforcement learning is how to efficiently exploit the data collected by an exploration strategy. This is especially important in case of continuous, high dimensional state spaces, since it is impossible to explore such spaces exhaustively. A simple but promising approach is to fix the number of state transitions which are sampled from the underlying markov decision process. For several kernel-based learning algorithms there exist convergence proofs and notable empirical results, if a fixed set of transition instances is used. In this article, we will analyze how function approximators similar to the CMAC-architecture can be combined with this idea. We will show both analytically and empirically the potential power of the CMAC architecture combined with an offline version of Q-learning.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Kernelizing LSPE(λ)

Kernelizing LSPE(λ)

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Jung, Tobias Polani, Daniel Johannes Gutenberg Univ Mainz D-6500 Mainz Germany Univ Hertfordshir Hatfield Herts England

ISBN: (纸本)9781424407064

We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(lambda) and LSTD(lambda). In particular we present the Ikernelization' of model-free LSPE(lambda). The 'kernelization' is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Continuous-time adaptive critics

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2007年第3期18卷 631-647页

作者： Hanselmann, Thomas Noakes, Lyle Zaknich, Anthony Univ Melbourne Dept Elect & Elect Engn Parkville Vic 3010 Australia Univ Western Australia Sch Math & Stat Crawley WA 6009 Australia Murdoch Univ Sch Engn Sci Perth WA 6150 Australia

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.

关键词： actor-critic adaptation adaptive critic design (ACD) approximate dynamic programming backpropagation through time (BPTT) continuous adaptive critic designs real-time recurrent learning (RTRL) reinforcement learning second-order actor adaptation

来源：评论

学校读者我要写书评

暂无评论

The effect of bootstrapping in multi-automata reinforcement learning

The effect of bootstrapping in multi-automata reinforcement ...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Peeters, Maarten Verbeeck, Katja Nowe, Ann Vrije Univ Brussel Computat Modeling Lab Pleinlaan 2 B-1050 Brussels Belgium

ISBN: (纸本)9781424407064

learning Automata are shown to be an excellent tool for creating learning multi-agent systems. Most algorithms used in current automata research expect the environment to end in an explicit end-stage. In this end-stage the rewards are given to the learning automata (i.e. Monte Carlo updating). This is however unfeasible in sequential decision problems with infinite horizon where no such end-stage exists. In this paper we propose a new algorithm based on one-step returns that uses bootstrapping to find good equilibrium paths in multi-stage games.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

A dynamic programming approach to viability problems

A dynamic programming approach to viability problems

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Coquelin, Pieffe-Amaud Martin, Sophie Munos, Reni Ecole Polytech Ctr Math Appl Palaiseau France Approximate Dynamic Programm Paris France

ISBN: (纸本)9781424407064

Viability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of he hi viability kernel, defined as the set of initial states from which am there exists a trajectory that remains in the set of constraints indefinitely. The theory is very elegant and appears naturally in many applications. Unfortunately, the current numerical approaches suffer from low computational efficiency, which limits the potential range of applications of this domain. In this paper we show that the viability kernel is the zero-level set of a related dynamic programming problem, which opens promising research directions for numerical approximation of the viability kernel using tools from approximate dynamic programming. We illustrate the approach using k-nearest neighbors on a toy problem in two dimensions and on a complex dynamical model for anaerobic digestion process in four dimensions.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming approach for job releasing and Sequencing in a Reentrant manufacturing line

An approximate dynamic programming approach for job releasin...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Ramirez-Hernandez, Jose A. Fernandez, Emmanuel Univ Cincinnati Dept Elect & Comp Engn Cincinnati OH 45221 USA Univ Cincinnati Cincinnati OH USA

ISBN: (纸本)9781424407064

This paper presents the application of an approximate dynamic programming (ADP) algorithm to the problem of job releasing and sequencing of a benchmark reentrant manufacturing line (RML). The ADP approach is based on the SARSA(lambda) algorithm with linear approximation structures that are tuned through a gradient-descent approach. The optimization is performed according to a discounted cost criterion that seeks both the minimization of inventory costs and the maximization of throughput. Simulation experiments are performed by using different approximation architectures to compare the performance of optimal strategies against policies obtained with AM Results from these experiments showed a statistical match in performance between the optimal and the approximated policies obtained through AM Such results also suggest that the applicability of the ADP algorithm presented in this paper may be a promising approach for larger RML systems.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

SVM viability controller active learning: Application to bike control

SVM viability controller active learning: Application to bik...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Chapel, Laetitia Deffuant, Guillaume Cemagref LISC Aubiere France

ISBN: (纸本)9781424407064

It was shown recently that SVMs are particularly adequate to define action policies to keep a dynamical system inside a given constraint set (in the framework of viability theory). However, the training set of the SVMs face the dimensionality curse, because it is based on a regular grid of the state space. In this paper, we propose an active learning approach, aiming at decreasing dramatically the training set size, keeping it as close as possible to the final number of support vectors. We use a virtual multi-resolution grid, and some particularities of the problem, to choose very efficient examples to add to the training set. To illustrate the performances of the algorithm, we solve a six-dimensional problem, controlling a bike on a track, problem usually solved using reinforcement learning techniques.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Q-learning with continuous state spaces and finite decision set

Q-learning with continuous state spaces and finite decision ...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Barty, Kengy Girardeau, Pierre Roy, Jean-Sebastien Strugarek, Cyrille EDF R&D 1 Ave Gen Gaulle F-92141 Clamart France

ISBN: (纸本)9781424407064

This paper aims to present an original technique in order to compute the optimal policy of a Markov Decision Problem with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced in 1989 by Watkins for discrete Markov Decision Problems. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We state under mild assumptions a converge theorem for this algorithm. Finally, we illustrate our algorithm by solving two classical problems: the Mountain car Task and the Puddle World Task.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Opposition-based reinforcement learning in the management of water resources

Opposition-based reinforcement learning in the management of...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Mahootchi, M. Tizhoosh, H. R. Ponnambalam, K. Univ Waterloo Sysr Design Engn 200 Univ Ave W Waterloo ON N2L 3G1 Canada

ISBN: (纸本)9781424407064

Opposition-Based learning (OBL) is a new scheme in machine intelligence. In this paper, an OBL version Q-learning which exploits opposite quantities to accelerate the learning is used for management of single reservoir operations. In this method, an agent takes an action, receives reward, and updates its knowledge in terms of action-value functions. Furthermore, the transition function which is the balance equation in the optimization model determines the next state and updates the action-value function pertinent to opposite action. Two type of opposite actions will be defined. It will be demonstrated that using OBL can significantly improve the efficiency of the operating policy within limited iterations. It is also shown that this technique is more robust than Q-learning.

关键词： water reservoirs Q-learning opposite action reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Efficient learning in cellular simultaneous recurrent neural networks - The case of maze navigation problem

Efficient learning in cellular simultaneous recurrent neural...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Ilin, Roman Kozma, Robert Werbos, Paul J. Univ Memphis Dept Math Sci Memphis TN 38117 USA Natl Sci Fdn Arlington VA 22230 USA

ISBN: (纸本)9781424407064

Cellular Simultaneous Recurrent Neural Networks (SRN) show great promise in solving complex function approximation problems. In particular, approximate dynamic programming is an important application area where SRNs have Significant potential advantages compared to other approximation methods. learning in SRNs, however, proved to be a notoriously difficult problem, which prevented their broader use. This paper introduces an extended Kalman filter approach to train SRNs. Using the two-dimensional maze navigation problem as a testbed, we illustrate the operation of the method and demonstrate its benefits in generalization and testing performance.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：