检索结果-内蒙古大学图书馆

Editorial Special Issue on Adaptive dynamic programming and reinforcement learning

ieee Transactions on Systems, Man, and Cybernetics: Systems 2020年第11期50卷 3944-3947页

作者： Liu, Derong Lewis, Frank L. Wei, Qinglai School of Automation Guangdong University of Technology Guangzhou510006 China Uta Research Institute University of Texas at Arlington Fort WorthTX76118 United States State Key Laboratory of Management and Control for Complex Systems Istitute of Automation Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

The past decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].

关键词： Special issues and sections reinforcement learning learning systems Control systems dynamic programming Real-time systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Individualization of pharmacological anemia management using reinforcement learning

Individualization of pharmacological anemia management using...

引用

international Joint Conference on Neural Networks

作者： Gaweda, AE Muezzinoglu, MK Aronoff, GR Jacobs, AA Zurada, JM Brier, ME Univ Louisville Dept Med Louisville KY 40292 USA Univ Louisville Dept Elect & Comp Engn Louisville KY 40292 USA Dept Vet Affairs Louisville KY 40202 USA

Effective management of anemia due to renal failure poses many challenges to physicians. Individual response to treatment varies across patient populations and, due to the prolonged character of the therapy, changes over time. In this work, a reinforcement learning-based approach is proposed as an alternative method for individualization of drug administration in the treatment of renal anemia. Q-learning, an off-policy approximate dynamic programming method, is applied to determine the proper dosing strategy in real time. Simulations compare the proposed methodology with the currently used dosing protocol. Presented results illustrate the ability of the proposed method to achieve the therapeutic goal for individuals with different response characteristics and its potential to become an alternative to currently used techniques. (c) 2005 Elsevier Ltd. All rights reserved.

关键词： reinforcement learning drug dosing anemia management

来源：评论

学校读者我要写书评

暂无评论

Accelerating Critic learning in approximate dynamic programming Via Value Templates and Perceptual learning

Accelerating Critic Learning in Approximate Dynamic Programm...

引用

international Joint Conference on Neural Networks 2003

作者： Shannon, Thaddeus T. Santiago, Roberto A. Lendaris, George G. NW Compl. Intelligence Laboratory Systems Science Ph.D. Program Portland State University Portland OR United States

The concept of value templates and perceptual learning are introduced as refinements to the reinforcement learning (RL) paradigm. We demonstrate a method for accelerating Dual Heuristic programming (DHP) critic training using value templates and perceptual learning. Both faster and more stable learning are achieved by using the value template and utilizing its inherent constraints to regularize the perceptual learning task. The method is demonstrated by tuning a neurofuzzy control system for a highly nonlinear 2nd order plant proposed by Sanner and Slotine. We take advantage of the TSK model framework throughout to keep the controller, critic, and model components used in DHP highly interpretable.

关键词： learning systems

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning in Continuous Action Spaces

Reinforcement Learning in Continuous Action Spaces

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Hado van Hasselt Marco A. Wiering Department of Information and Computing Sciences University of Utrecht Utrecht Netherlands

Quite some research has been done on reinforcement learning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named continuous actor critic learning automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method

关键词： learning automata Computational modeling dynamic programming Intelligent systems Telephony Books Physics computing

来源：评论

学校读者我要写书评

暂无评论

Particle Swarn Optimized Adaptive dynamic programming

Particle Swarn Optimized Adaptive Dynamic Programming

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Dongbin Zhao Jianqiang Yi Derong Liu Key Laboratory of Complex Systems and Intelligence Science Institute of Automation Chinese Academy and Sciences Beijing China Department of Electrical and Computer Engineering University of Illinois Chicago Chicago IL USA

Particle swarm optimization is used for the training of the action network and critic network of the adaptive dynamic programming approach. The typical structures of the adaptive dynamic programming and particle swarm optimization are adopted for comparison to other learning algorithms such as gradient descent method. Besides simulation on the balancing of a cart pole plant, a more complex plant pendulum robot (pendubot) is tested for the learning performance. Compared to traditional adaptive dynamic programming approaches, the proposed evolutionary learning strategy is verified as faster convergence and higher efficiency. Furthermore, the structure becomes simple because the plant model does not need to be identified beforehand

关键词： dynamic programming Particle swarm optimization Neural networks Robots Backpropagation Adaptive systems Evolutionary computation learning Cost function Testing

来源：评论

学校读者我要写书评

暂无评论

Hamiltonian-Driven Adaptive dynamic programming Based on Extreme learning Machine 14th

引用

14th international symposium on Neural Networks (ISNN)

作者： Yang, Yongliang Wunsch, Donald Guo, Zhishan Yin, Yixin Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9783319590721;9783319590714

In this paper, a novel frame work of reinforcement learning for continuous time dynamical system is presented based on the Hamiltonian functional and extreme learning machine. The idea of solution search in the optimization is introduced to find the optimal control policy in the optimal control problem. The optimal control search consists of three steps: evaluation, comparison and improvement of arbitrary admissible policy. The Hamiltonian functional plays an important role in the above framework, under which only one critic is required in the adaptive critic structure. The critic network is implemented by the extreme learning machine. Finally, simulation study is conducted to verify the effectiveness of the presented algorithm.

关键词： reinforcement learning Adaptive dynamic programming Extreme learning machine Hamiltonian functional Optimization

来源：评论

学校读者我要写书评

暂无评论

Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence Proof

Discrete-time nonlinear HJB solution using Approximate dynam...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Asma Al-Tamimi Frank Lewis Automation & Robotics Research Institute University of Texas Arlington Fort Worth TX USA

In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely heuristic dynamic programming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used - one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the algebraic Riccati equation (ARE). The second example considers a nonlinear control system.

关键词： dynamic programming Optimal control Function approximation Riccati equations Robotics and automation Nonlinear equations learning Convergence Linear systems Neural networks

来源：评论

学校读者我要写书评

暂无评论

The Effect of Bootstrapping in Multi-Automata reinforcement learning

The Effect of Bootstrapping in Multi-Automata Reinforcement ...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Maarten Peeters Katja Verbeeck Ann Nowe Computational Modeling Laboratory Vrije Universiteit Brussel Brussels Belgium

learning automata are shown to be an excellent tool for creating learning multi-agent systems. Most algorithms used in current automata research expect the environment to end in an explicit end-stage. In this end-stage the rewards are given to the learning automata (i.e. Monte Carlo updating). This is however unfeasible in sequential decision problems with infinite horizon where no such end-stage exists. In this paper we propose a new algorithm based on one-step returns that uses bootstrapping to find good equilibrium paths in multi-stage games

关键词： learning automata Monte Carlo methods Convergence dynamic programming Computational modeling Multiagent systems Infinite horizon Equations

来源：评论

学校读者我要写书评

暂无评论

Knowledge Transfer Using Local Features

Knowledge Transfer Using Local Features

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Martin Stolle Christopher G. Atkeson Robotics Institute Carnegie Mellon University Pittsburgh PA USA

We present a method for reducing the effort required to compute policies for tasks based on solutions to previously solved tasks. The key idea is to use a learned intermediate policy based on local features to create an initial policy for the new task. In order to further improve this initial policy, we developed a form of generalized policy iteration. We achieve a substantial reduction in computation needed to find policies when previous experience is available

关键词： Knowledge transfer learning Automatic control Navigation dynamic programming Robots Artificial intelligence Strips Legged locomotion

来源：评论

学校读者我要写书评

暂无评论

Dual Representations for dynamic programming and reinforcement learning

Dual Representations for Dynamic Programming and Reinforceme...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Tao Wang Michael Bowling Dale Schuurmans Department of Computing Science University of Alberta Edmonton Canada

We investigate the dual approach to dynamic programming and reinforcement learning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distributions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamic programming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamic programming and reinforcement learning problems

关键词： dynamic programming learning Approximation algorithms Probability distribution Linear approximation Decision making Distributed computing Heuristic algorithms Linear programming Yield estimation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：