检索结果-内蒙古大学图书馆

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Lin, Wei-Song Yang, Ping-Chieh Natl Taiwan Univ Dept Elect Engn Inst Elect Engn 1 Sec 4Roosevelt Rd Taipei 106 Taiwan

ISBN: (纸本)9781424407064

Autonomous drive of wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis of the WMR system. This paper presents the dual heuristic programming (DHF) adaptive critic design of the motion control system that enables WMR to achieve the control purpose simply by learning through trial. The design consists of an adaptive critic velocity neuro-control loop and a posture neuro-control loop. The neural weights in the velocity neuro-controller (VNC) are corrected with the DHP adaptive critic method. The designer simply expresses the control objective with a utility function. The VNC learns by sequential optimization to satisfy the control objective. The posture neuro-controller (PNC) approximates the inverse velocity model of WMR so as to map planned positions to desired velocities. Supervised drive of WMR in variant velocities supplies training samples for the PNC and VNC to setup the neural weights. In autonomous drive, the learning mechanism keeps improving the PNC and VNC. The design is evaluated on an experimental WMR. The excellent results make it certain that the DHP adaptive critic motion control design enables WMR to develop the control ability autonomously.

关键词： adaptive critic design autonomous robot neuro-control dual heuristic programming reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive critic designs for discrete-time zero-sum games with application to H_∞ control

引用

ieee TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS 2007年第1期37卷 240-247页

作者： Al-Tamimi, Asma Abu-Khalaf, Murad Lewis, Frank L. Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA

In this correspondence, adaptive critic approximate dynamic programming designs are derived to solve the discrete-time zero-sum game in which the state and action spaces are continuous. This results in a forward-in-time reinforcement learning algorithm that converges to the Nash equilibrium of the corresponding zero-sum game. The results in this correspondence can be thought of as a way to solve the Riccati equation of the well-known discrete-time H-infinity optimal control problem forward in time. Two schemes are presented, namely: 1) a heuristic dynamic programming and 2) a dual-heuristic dynamic programming, to solve for the value function and the costate of the game, respectively. An H-infinity autopilot design for an F-16 aircraft is presented to-illustrate the results.

关键词： adaptive critics approximate dynamic programming (ADP) H-infinity optimal control policy iteration zero-sum game

来源：评论

学校读者我要写书评

暂无评论

Dual representations for dynamic programming and reinforcement learning

Dual representations for dynamic programming and reinforceme...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Wang, Tao Bowling, Michael Schuurmans, Dale Univ Alberta Dept Comp Sci Edmonton AB Canada

ISBN: (纸本)9781424407064

We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distribu tions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Toward effective combination of off-line and on-line training in ADP framework

Toward effective combination of off-line and on-line trainin...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Prokhorov, Danil Toyota Technol Ctr Ann Arbor MI 48105 USA

ISBN: (纸本)9781424407064

We are interested in finding the most effective combination between off-line and on-line/real-time training in approximate dynamic programming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi- stream Kalman filter method [1], whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcement learning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Randomly sampling actions in dynamic programming

Randomly sampling actions in dynamic programming

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Atkeson, Christopher G. Carnegie Mellon Univ Inst Robot Pittsburgh PA 15213 USA

ISBN: (纸本)9781424407064

We describe an approach towards reducing the curse of dimensionality for deterministic dynamic programming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof

Discrete-time nonlinear HJB solution using approximate dynam...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Al-Tamimi, Asma Lewis, Frank Univ Texas Automat & Robot Res Inst Ft Worth TX 76118 USA Univ Texas Arlington Automat & Robot Res Inst Ft Worth TX 76118 USA

ISBN: (纸本)9781424407064

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used- one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the Algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： adaptive critics approximate dynamic programming HJB policy iterations.

来源：评论

学校读者我要写书评

暂无评论

The knowledge gradient policy for offline learning with independent normal rewards

The knowledge gradient policy for offline learning with inde...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Frazier, Peter Powell, Warren Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simula... 详细信息

ISBN: (纸本)9781424407064

关键词： learning systems

来源：评论

学校读者我要写书评

暂无评论

On a successful application of multi-agent reinforcement learning to operations research benchmarks

On a successful application of multi-agent reinforcement lea...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Gabel, Thomas Riedmiller, Martin Univ Osnabruck Dept Math & Comp Sci Inst Cognit Sci D-49069 Osnabruck Germany

ISBN: (纸本)9781424407064

In this paper, we suggest and analyze the use of approximate reinforcement learning techniques for a new category of challenging benchmark problems from the field of Operations Research. We demonstrate that interpreting and solving the task of job-shop scheduling as a multi-agent learning problem is beneficial for obtaining near-optimal solutions and can very well compete with alternative solution approaches. The evaluation of our algorithms focuses on numerous established Operations Research benchmark problems.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

Model-based reinforcement learning in factored-state MDPs

Model-based reinforcement learning in factored-state MDPs

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Strehl, Alexander L. Rutgers State Univ Dept Comp Sci Piscataway NJ 08854 USA

ISBN: (纸本)9781424407064

We consider the problem of learning in a factored state Markov Decision Process that is structured to allow a compact representation. We show that the well-known algorithm, factored Rmax, performs near-optimally on all but a number of timesteps that is polynomial in the size of the compact representation, which is often exponentially smaller than the number of states. This is equivalent to the result obtained by Kearns and Koller for their DBN-E-3 algorithm, except that we've conducted the analysis in a more general setting. We also extend the results to a new algorithm, factored IE, that uses the Interval Estimation approach to exploration and can be expected to outperform factored Rmax on most domains.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in continuous action spaces

Reinforcement learning in continuous action spaces

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： van Hasselt, Hado Wiering, Marco A. Univ Utrecht Dept Informat & Comp Sci Intelligent Syst Grp Padualaan 14 NL-3508 TB Utrecht Netherlands

ISBN: (纸本)9781424407064

Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic learning Automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：