检索结果-内蒙古大学图书馆

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Van Hasselt, Hado Wiering, Marco A. Univ Utrecht Dept Informat & Comp Sci Intelligent Syst Grp Padualaan 14 NL-3508 TB Utrecht Netherlands

ISBN: (纸本)9781424407064

A theoretical analysis of Model-Based Temporal Difference learning for Control is given, leading to a proof of convergence. This work differs from earlier 'work on the convergence of Temporal Difference learning by proving convergence to the optimal value function. This means that not the values of the current policy are found, but instead the policy is updated in such a manner that ultimately the optimal policy is guaranteed to be reached.

关键词： learning algorithms

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2007年第2期37卷 425-436页

作者： He, Pingan Jagannathan, S. Univ Missouri Dept Elect & Comp Engn Rolla MO 65409 USA

A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing certain quadratic performance indexes. Using the Lyapunov approach and with novel weight updates, the uniformly ultimate boundedness of the closed-loop tracking error and weight estimates is shown in the presence of NN approximation errors and bounded unknown disturbances. The proposed NN controller works in the presence of multiple nonlinearities, unlike other schemes that normally approximate one nonlinearity. Moreover, the adaptive critic NN controller does not require an explicit offline training phase, and the NN weights can be initialized at zero or random. Simulation results justify the theoretical analysis.

关键词： approximate dynamic programming neural network control optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

An optimal ADP algorithm for a high-dimensional stochastic control problem

An optimal ADP algorithm for a high-dimensional stochastic c...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Nascimento, Juliana Powell, Warren Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

ISBN: (纸本)9781424407064

We propose a provably optimal approximate dynamic programming algorithm for a class of multistage stochastic problems, taking into account that the probability distribution of the underlying stochastic process is not known and the state space is too large to be explored entirely. The algorithm and its proof of convergence rely on the fact that the optimal value functions of the problems within the problem class are concave and piecewise linear. The algorithm is a combination of Monte Carlo simulation, pure exploitation, stochastic approximation and a projection operation. Several applications, in areas like energy, control, inventory and finance, fall under the framework.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods

Two novel on-policy reinforcement learning algorithms based ...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Wiering, Marco A. van Hasselt, Hado Univ Utrecht Dept Informat & Comp Sci Intelligent Syst Grp Padualaan 14 NL-3508 TB Utrecht Netherlands

ISBN: (纸本)9781424407064

This paper describes two novel on-policy reinforcement learning algorithms, named QV(lambda)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(lambda)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning automaton-like update rule to update the actor. We describe several possible advantages of these methods compared to other value-function-based reinforcement learning algorithms such as Q-learning, Sarsa, and conventional Actor-Critic methods. Experiments are performed on (1) small, (2) large, (3) partially observable, and (4) dynamic maze problems with tabular and neural network value-function representations, and on the mountain car problem. The overall results show that the two novel algorithms can outperform previously known reinforcement learning algorithms.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A novel fuzzy reinforcement learning approach in two-level intelligent control of 3-DOF robot manipulators

A novel fuzzy reinforcement learning approach in two-level i...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Sadati, Nasser Emamzadeh, Mohammad Mollaie Sharif Univ Technol Dept Elect Engn Intelligent Syst Lab Tehran Iran

ISBN: (纸本)9781424407064

In this paper, a fuzzy coordination method based on Interaction Prediction Principle (IPP) and reinforcement learning is presented for the optimal control of robot manipulators with three degrees-of-freedom. For this purpose, the robot manipulator is considered as a two-level large-scale system where in the first level, the robot manipulator is decomposed into several subsystems. In the second level, a fuzzy interaction prediction system is introduced for coordination of the overall system where a critic vector is also used for evaluating its performance. The simulation results on using the proposed novel approach for optimal control of robot manipulators show its effectiveness and superiority in comparison with the centralized optimization methods.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Sparse temporal difference learning using LASSO

Sparse temporal difference learning using LASSO

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Loth, Manuel Davy, Manuel Preux, Philippe Univ Lille CNRS LIFL SequeL INRIA Futurs Villeneuve France

ISBN: (纸本)9781424407064

We consider the problem of on-line value function estimation in reinforcement learning. We concentrate on the function approximator to use. To try to break the curse of dimensionality, we focus on non parametric function approxi-mators. We propose to fit the use of kernels into the temporal difference algorithms by using regression via the LASSO. We introduce the equi-gradient descent algorithm (EGD) which is a direct adaptation of the one recently introduced in the LARS algorithm family for solving the LASSO. We advocate our choice of the EGD as a judicious algorithm for these tasks. We present the EGD algorithm in details as well as some experimental results. We insist on the qualities of the EGD for reinforcement learning.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Fitted Q iteration with CMACs

Fitted Q iteration with CMACs

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Timmer, Stephan Riedmiller, Martin Univ Osnabruck Dept Comp Sci D-4500 Osnabruck Germany

ISBN: (纸本)9781424407064

A major issue in model-free reinforcement learning is how to efficiently exploit the data collected by an exploration strategy. This is especially important in case of continuous, high dimensional state spaces, since it is impossible to explore such spaces exhaustively. A simple but promising approach is to fix the number of state transitions which are sampled from the underlying markov decision process. For several kernel-based learning algorithms there exist convergence proofs and notable empirical results, if a fixed set of transition instances is used. In this article, we will analyze how function approximators similar to the CMAC-architecture can be combined with this idea. We will show both analytically and empirically the potential power of the CMAC architecture combined with an offline version of Q-learning.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Kernelizing LSPE(λ)

Kernelizing LSPE(λ)

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Jung, Tobias Polani, Daniel Johannes Gutenberg Univ Mainz D-6500 Mainz Germany Univ Hertfordshir Hatfield Herts England

ISBN: (纸本)9781424407064

We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(lambda) and LSTD(lambda). In particular we present the Ikernelization' of model-free LSPE(lambda). The 'kernelization' is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Continuous-time adaptive critics

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2007年第3期18卷 631-647页

作者： Hanselmann, Thomas Noakes, Lyle Zaknich, Anthony Univ Melbourne Dept Elect & Elect Engn Parkville Vic 3010 Australia Univ Western Australia Sch Math & Stat Crawley WA 6009 Australia Murdoch Univ Sch Engn Sci Perth WA 6150 Australia

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.

关键词： actor-critic adaptation adaptive critic design (ACD) approximate dynamic programming backpropagation through time (BPTT) continuous adaptive critic designs real-time recurrent learning (RTRL) reinforcement learning second-order actor adaptation

来源：评论

学校读者我要写书评

暂无评论

The effect of bootstrapping in multi-automata reinforcement learning

The effect of bootstrapping in multi-automata reinforcement ...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Peeters, Maarten Verbeeck, Katja Nowe, Ann Vrije Univ Brussel Computat Modeling Lab Pleinlaan 2 B-1050 Brussels Belgium

ISBN: (纸本)9781424407064

learning Automata are shown to be an excellent tool for creating learning multi-agent systems. Most algorithms used in current automata research expect the environment to end in an explicit end-stage. In this end-stage the rewards are given to the learning automata (i.e. Monte Carlo updating). This is however unfeasible in sequential decision problems with infinite horizon where no such end-stage exists. In this paper we propose a new algorithm based on one-step returns that uses bootstrapping to find good equilibrium paths in multi-stage games.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：