检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Liu, Chunming Xu, Xin Haiyun Hu Dai, Bin College of Mechatronics and Automation National University of Defense Technology Changsha 410073 China

ISBN: (纸本)9781424498888

Approximate policy iteration (API) has been shown to be a class of reinforcement learning methods with stability and sample efficiency. However, sample collection is still an open problem which is critical to the performance of API methods. In this paper, a novel adaptive sample collection strategy using active learning-based exploration is proposed to enhance the performance of kernel-based API. In this strategy, an online kernel-based least squares policy iteration (KLSPI) method is adopted to construct nonlinear features and approximate the Q-function simultaneously. Therefore, more representative samples can be obtained for value function approximation. Simulation results on typical learning control problems illustrate that by using the proposed strategy, the performance of KLSPI can be improved remarkably. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A reinforcement learning approach for sequential mastery testing

A reinforcement learning approach for sequential mastery tes...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： El-Alfy, El-Sayed M. College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Dhahran 31261 Saudi Arabia

ISBN: (纸本)9781424498888

This paper explores a novel application for reinforcement learning (RL) techniques to sequential mastery testing. In such systems, the goal is to classify each examined person, using the minimal number of test items, as master or non-master. Using RL, an intelligent agent autonomously learns from interactions to administer more informative and effective variable-length tests. Empirical results are also provided to evaluate the performance of the proposed approach as compared to two common approaches for variable-length testing (Bayesian decision and sequential probability ratio test) as well as to the fixed-length testing. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Feedback controller parameterizations for reinforcement learning

Feedback controller parameterizations for Reinforcement Lear...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Roberts, John W. Manchester, Ian R. Tedrake, Russ CSAIL MIT Cambridge MA 02139 United States

ISBN: (纸本)9781424498888

reinforcement learning offers a very general framework for learning controllers, but its effectiveness is closely tied to the controller parameterization used. Especially when learning feedback controllers for weakly stable systems, ineffective parameterizations can result in unstable controllers and poor performance both in terms of learning convergence and in the cost of the resulting policy. In this paper we explore four linear controller parameterizations in the context of REINFORCE, applying them to the control of a reaching task with a linearized flexible manipulator. We find that some natural but naive parameterizations perform very poorly, while the Youla Parameterization (a popular parameterization from the controls literature) offers a number of robustness and performance advantages. © 2011 ieee.

关键词： Parameterization

来源：评论

学校读者我要写书评

暂无评论

Using supervised training signals of observable state dynamics to speed-up and improve reinforcement learning

Using supervised training signals of observable state dynami...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Elliott, Daniel L. Anderson, Charles Colorado State Univ Dept Comp Sci Ft Collins CO 80523 USA

ISBN: (纸本)9781479945528

A common complaint about reinforcement learning (RL) is that it is too slow to learn a value function which gives good performance. This issue is exacerbated in continuous state spaces. This paper presents a straight-forward approach to speeding-up and even improving RL solutions by reusing features learned during a pre-training phase prior to Q-learning. During pre-training, the agent is taught to predict state change given a state/action pair. The effect of pre-training is examined using the model-free Q-learning approach but could readily be applied to a number of RL approaches including model-based RL. The analysis of the results provides ample evidence that the features learned during pre-training is the reason behind the improved RL performance.

关键词： learning (artificial intelligence) neural nets state-space methods RL performance improvement RL solution improvement continuous state spaces feature reuse model-based RL approach model-free Q-learning approach observable state dynamics pretraining phase reinforcement learning state change prediction state-action pair supervised training signals value function learning Artificial neural networks Computational modeling Data models Heuristic algorithms learning (artificial intelligence) Supervised learning Training learning (artificial intelligence) State-space methods data models Neural network Semi-supervised learning Heuristic algorithms Computational modeling Artificial neural networks Functional training learning State Change Training

来源：评论

学校读者我要写书评

暂无评论

A Comparison of Approximate dynamic programming Techniques on Benchmark Energy Storage Problems: Does Anything Work?

A Comparison of Approximate Dynamic Programming Techniques o...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Jiang, Daniel R. Pham, Thuy V. Powell, Warren B. Salas, Daniel F. Scott, Warren R.

ISBN: (纸本)9781479945528

As more renewable, yet volatile, forms of energy like solar and wind are being incorporated into the grid, the problem of finding optimal control policies for energy storage is becoming increasingly important. These sequential decision problems are often modeled as stochastic dynamic programs, but when the state space becomes large, traditional (exact) techniques such as backward induction, policy iteration, or value iteration quickly become computationally intractable. Approximate dynamic programming (ADP) thus becomes a natural solution technique for solving these problems to near-optimality using significantly fewer computational resources. In this paper, we compare the performance of the following: various approximation architectures with approximate policy iteration (API), approximate value iteration (AVI) with structured lookup table, and direct policy search on a benchmarked energy storage problem (i.e., the optimal solution is computable).

关键词： dynamic programming energy storage power engineering computing power system management renewable energy sources table lookup ADP API AVI approximate dynamic programming approximate policy iteration approximate value iteration backward induction dynamic programming techniques energy storage control policy lookup table natural solution technique solar energy stochastic dynamic programs wind energy Approximation algorithms Benchmark testing Energy storage Equations Function approximation Mathematical model Table lookup dynamic programming energy storage Power system management AVI Benchmark testing Power engineering computing function approximation Approximation algorithms Adenosine Diphosphate Automatic data processing Renewable energy renewable energy sources Wind energy Solar Energy

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning and adaptive dynamic programming for Feedback Control

引用

ieee CIRCUITS AND SYSTEMS MAGAZINE 2009年第3期9卷 32-50页

作者： Lewis, Frank L. Vrabie, Draguna Univ Texas Arlington Automat & Robot Res Inst Arlington TX USA S China Univ Technol Guangzhou Guangdong Peoples R China Shanghai Jiao Tong Univ Shanghai Peoples R China

Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior.

关键词： learning Programmable control adaptive control dynamic programming Feedback control Organisms Optimal control Control systems Design engineering Systems engineering and theory

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming based controller for an underactuated 6DoF quadrotor

An approximate Dynamic Programming based controller for an u...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Stingu, Emanuel Lewis, Frank L. Automation and Robotics Research Institute University of Texas at Arlington Arlington TX United States

ISBN: (纸本)9781424498888

This paper discusses how the principles of adaptive dynamic programming (ADP) can be applied to the control of a quadrotor helicopter platform flying in an uncontrolled environment and subjected to various disturbances and model uncertainties. ADP is based on reinforcement learning using an actor-critic structure. Due to the complexity of the quadrotor system, the learning process has to use as much information as possible about the system and the environment. Various methods to improve the learning speed and efficiency are presented. Neural networks with local activation functions are used as function approximators because the state-space can not be explored efficiently due to its size and the limited time available. The complex dynamics is controlled by a single critic and by multiple actors thus avoiding the curse of dimensionality. After a number of iterations, the overall actor-critic structure stores information (knowledge) about the system dynamics and the optimal controller that can accomplish the explicit or implicit goal specified in the cost function. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Event-based Optimal Regulator Design for Nonlinear Networked Control Systems

Event-based Optimal Regulator Design for Nonlinear Networked...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning (adprl)

作者： Sahoo, Avimanyu Xu, Hao Jagannathan, S. Missouri Univ Sc & Tech Dept Elect & Comp Engn Rolla MO 65409 USA Texas A&M Univ Coll Sci & Engn Dept Elect Engn Corpus Christi TX USA

ISBN: (纸本)9781479945528

This paper presents a novel stochastic event-based near optimal control strategy to regulate a networked control system (NCS) represented as an uncertain nonlinear continuous time system. An online stochastic actor-critic neural network (NN) based approach is utilized to achieve the near optimal regulation in the presence of network constraints, such as, network induced time-varying delays and random packet losses under event-based transmission of the feedback signals. The transformed nonlinear NCS in discrete-time after the incorporation the delays and packet losses is utilized for the actor-critic NN based controller design. To relax the knowledge of the control coefficient matrix, a NN based identifier is used. Event sampled state vector is utilized as NN inputs and their respective weights are updated non-periodically at the occurrence of events. Further, an event-trigger condition is designed by using the Lyapunov technique to ensure ultimate boundedness of all the closed-loop signals and save network resources and computation. Moreover, policy and value iterations are not utilized for the stochastic optimal regulator design. Finally, the analytical design is verified by using a numerical example by carrying out Monte-Carlo simulations.

关键词： Event-triggered control optimal control adaptive dynamic programming neural networks networked control systems

来源：评论

学校读者我要写书评

暂无评论

Approximate reinforcement learning: An overview

Approximate reinforcement learning: An overview

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Buşoniu, Lucian Ernst, Damien De Schutter, Bart Babuška, Robert Delft Center for Systems and Control Delft Univ. of Technology Netherlands Research Associate of the FRS-FNRS Systems and Modeling Unit University of Liège Liège Belgium

ISBN: (纸本)9781424498888

reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Grounding subgoals in information transitions

Grounding subgoals in information transitions

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Van Dijk, Sander G. Polani, Daniel Adaptive Systems Research Group University of Hertfordshire Hatfield United Kingdom

ISBN: (纸本)9781424498888

In reinforcement learning problems, the construction of subgoals has been identified as an important step to speed up learning and to enable skill transfer. For this purpose, one typically extracts states from various saliency properties of an MDP transition graph, most notably bottleneck states. Here we introduce an alternative approach to this problem: assuming a family of MDPs with multiple goals but with a fixed transition graph, we introduce the relevant goal information as the amount of Shannon information that the agent needs to maintain about the current goal at a given state to select the appropriate action. We show that there are distinct transition states in the MDP at which new relevant goal information has to be considered for selecting the next action. We argue that these transition states can be interpreted as subgoals for the current task class, and we use these states to automatically create a hierarchical policy, according to the well-established Options model for hierarchical reinforcement learning. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：