检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Buşoniu, Lucian Ernst, Damien De Schutter, Bart Babuška, Robert Delft Center for Systems and Control Delft Univ. of Technology Netherlands Research Associate of the FRS-FNRS Systems and Modeling Unit University of Liège Liège Belgium

ISBN: (纸本)9781424498888

reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic lead time promising

Dynamic lead time promising

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Reindorp, Matthew J. Fu, Michael C. Department of Industrial Engineering and Innovation Sciences Eindhoven University of Technology Netherlands Robert H. Smith School of Business Institute for Systems Research University of Maryland United States

ISBN: (纸本)9781424498888

We consider a make-to-order business that serves customers in multiple priority classes. Orders from customers in higher classes bring greater revenue, but they expect shorter lead times than customers in lower classes. In making lead time promises, the firm must recognize preexisting order commitments, uncertainty over future demand from each class, and the possibility of supply chain disruptions. We model this scenario as a Markov decision problem and use reinforcement learning to determine the firm's lead time policy. In order to achieve tractability on large problems, we utilize a sequential decision-making approach that effectively allows us to eliminate one dimension from the state space of the system. Initial numerical results from the sequential dynamic approach suggest that the resulting policies more closely approximate optimal policies than static optimization approaches. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

On learning with imperfect representations

On learning with imperfect representations

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Kalyanakrishnan, Shivaram Stone, Peter Department of Computer Science University of Texas at Austin 1616 Guadalupe St Austin TX 78701 United States

ISBN: (纸本)9781424498888

In this paper we present a perspective on the relationship between learning and representation in sequential decision making tasks. We undertake a brief survey of existing real-world applications, which demonstrates that the classical tabular representation seldom applies in practice. Specifically, several practical tasks suffer from state aliasing, and most demand some form of generalization and function approximation. Coping with these representational aspects thus becomes an important direction for furthering the advent of reinforcement learning in practice. The central thesis we present in this position paper is that in practice, learning methods specifically developed to work with imperfect representations are likely to perform better than those developed for perfect representations and then applied in imperfect- representation settings. We specify an evaluation criterion for learning methods in practice, and propose a framework for their synthesis. In particular, we highlight the degrees of representational bias prevalent in different learning methods. We reference a variety of relevant literature as a background for this introspective essay. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Application of reinforcement learning-based algorithms in CO2 allowance and electricity markets

Application of reinforcement learning-based algorithms in CO...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Nanduri, Vishnuteja Department of Industrial and Manufacturing Engineering University of Wisconsin-Milwaukee Milwaukee WI 53211 United States

ISBN: (纸本)9781424498888

Climate change is one of the most important challenges faced by the world this century. In the U.S., the electric power industry is the largest emitter of CO2, contributing to the climate crisis. Federal emissions control bills in the form of cap-and-trade programs are currently idling in the U.S. Congress. In the mean time, ten states in the northeastern U.S. have adopted a regional cap-and-trade program to reduce CO2 levels and also to increase investments in cleaner technologies. Many of the states in which the cap-and-trade programs are active operate under a restructured market paradigm, where generators compete to supply power. This research presents a bi-level game-theoretic model to capture competition between generators in cap-and-trade markets and restructured electricity markets. The solution to the game-theoretic model is obtained using a reinforcement learning based algorithm. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Higher order Q-learning

Higher order Q-Learning

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Edwards, Ashley Pottenger, William M. Department of Computer Science University of Georgia Athens GA 30606 United States Department of Computer Science and DIMACS Rutgers University Piscataway NJ 08854 United States

ISBN: (纸本)9781424498888

Higher order learning is a statistical relational learning framework in which relationships between different instances of the same class are leveraged (Ganiz, Lytkin and Pottenger, 2009). learning can be supervised or unsupervised. In contrast, reinforcement learning (Q-learning) is a technique for learning in an unknown state space. Action selection is often based on a greedy, or epsilon greedy approach. The problem with this approach is that there is often a large amount of initial exploration before convergence. In this article we introduce a novel approach to this problem that treats a state space as a collection of data from which latent information can be extrapolated. From this data, we classify actions as leading to a high reward or low reward, and formulate behaviors based on this information. We provide experimental evidence that this technique drastically reduces the amount of exploration required in the initial stages of learning. We evaluate our algorithm in a well-known reinforcement learning domain, grid-world. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark

Improved neural fitted Q iteration applied to a novel comput...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Gabel, Thomas Lutz, Christian Riedmiller, Martin Machine Learning Lab Department of Computer Science University of Freiburg 79110 Freiburg Germany

ISBN: (纸本)9781424498888

Neural batch reinforcement learning (RL) algorithms have recently shown to be a powerful tool for model-free reinforcement learning problems. In this paper, we present a novel learning benchmark from the realm of computer games and apply a variant of a neural batch RL algorithm in the scope of this benchmark. Defining the learning problem and appropriately adjusting all relevant parameters is often a tedious task for the researcher who implements and investigates some learning approach. In RL, the suitable choice of the function c of immediate costs is crucial, and, when utilizing multi-layer perceptron neural networks for the purpose of value function approximation, the definition of c must be well aligned with the specific characteristics of this type of function approximator. Determining this alignment is especially tricky, when no a priori knowledge about the task and, hence, about optimal policies is available. To this end, we propose a simple, but effective dynamic scaling heuristic that can be seamlessly integrated into contemporary neural batch RL algorithms. We evaluate the effectiveness of this heuristic in the context of the well-known pole swing-up benchmark as well as in the context of the novel gaming benchmark we are suggesting. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning algorithms for solving classification problems

Reinforcement learning algorithms for solving classification...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Wiering, Marco A. Van Hasselt, Hado Pietersma, Auke-Dirk Schomaker, Lambert Dept. of Artificial Intelligence University of Groningen Netherlands Multi-agent and Adaptive Computation Centrum Wiskunde en Informatica Netherlands

ISBN: (纸本)9781424498888

We describe a new framework for applying reinforcement learning (RL) algorithms to solve classification tasks by letting an agent act on the inputs and learn value functions. This paper describes how classification problems can be modeled using classification Markov decision processes and introduces the Max-Min ACLA algorithm, an extension of the novel RL algorithm called actor-critic learning automaton (ACLA). Experiments are performed using 8 datasets from the UCI repository, where our RL method is combined with multi-layer perceptrons that serve as function approximators. The RL method is compared to conventional multi-layer perceptrons and support vector machines and the results show that our method slightly outperforms the multi-layer perceptron and performs equally well as the support vector machine. Finally, many possible extensions are described to our basic method, so that much future research can be done to make the proposed method even better. © 2011 ieee.

关键词： Support vector machines

来源：评论

学校读者我要写书评

暂无评论

Transformation Invariant On-Line Target Recognition

引用

ieee TRANSACTIONS ON NEURAL NETWORKS 2011年第6期22卷 906-918页

作者： Iftekharuddin, Khan M. Univ Memphis Dept Elect & Comp Engn Intelligent Syst & Image Proc Lab Memphis TN 38152 USA

Transformation invariant automatic target recognition (ATR) has been an active research area due to its widespread applications in defense, robotics, medical imaging and geographic scene analysis. The primary goal for this paper is to obtain an on-line ATR system for targets in presence of image transformations, such as rotation, translation, scale and occlusion as well as resolution changes. We investigate biologically inspired adaptive critic design (ACD) neural network (NN) models for on-line learning of such transformations. We further exploit reinforcement learning (RL) in ACD framework to obtain transformation invariant ATR. We exploit two ACD designs, such as heuristic dynamic programming (HDP) and dual heuristic dynamic programming (DHP) to obtain transformation invariant ATR. We obtain extensive statistical evaluations of proposed on-line ATR networks using both simulated image transformations and real benchmark facial image database, UMIST, with pose variations. Our simulations show promising results for learning transformations in simulated images and authenticating out-of plane rotated face images. Comparing the two on-line ATR designs, HDP outperforms DHP in learning capability and robustness and is more tolerant to noise. The computational time involved in HDP is also less than that of DHP. On the other hand, DHP achieves a 100% success rate more frequently than HDP for individual targets, and the residual critic error in DHP is generally smaller than that of HDP. Mathematical analyses of both our RL-based on-line ATR designs are also obtained to provide a sufficient condition for asymptotic convergence in a statistical average sense.

关键词： Active on-line learning automatic target recognition dual heuristic dynamic programming face authentication heuristic dynamic programming image transformation invariance reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Safe reinforcement learning in high-risk tasks through policy improvement

Safe reinforcement learning in high-risk tasks through polic...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Garcia Polo, Francisco Javier Fernandez Rebollo, Fernando Computer Science Department Universidad Carlos III de Madrid Avenida de la Universidad 30 28911 Leganés Madrid Spain

ISBN: (纸本)9781424498888

reinforcement learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition. © 2011 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

An adaptive-learning framework for semi-cooperative multi-agent coordination

An adaptive-learning framework for semi-cooperative multi-ag...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning

作者： Boukhtouta, Abdeslem Berger, Jean Powell, Warren B. George, Abraham Defence Research and Development Canada-Valcartier Quebec QC G3J 1X5 Canada Department of Operations Research and Financial Engineering Princeton University Princeton NJ 08544 United States

ISBN: (纸本)9781424498888

Complex problems involving multiple agents exhibit varying degrees of cooperation. The levels of cooperation might reflect both differences in information as well as differences in goals. In this research, we develop a general mathematical model for distributed, semi-cooperative planning and suggest a solution strategy which involves decomposing the system into subproblems, each of which is specified at a certain period in time and controlled by an agent. The agents communicate marginal values of resources to each other, possibly with distortion. We design experiments to demonstrate the benefits of communication between the agents and show that, with communication, the solution quality approaches that of the ideal situation where the entire problem is controlled by a single agent. © 2011 ieee.

关键词： Multi agent systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：