检索结果-内蒙古大学图书馆

Reinforcement learning with automatic basis construction based on isometric feature mapping

INFORMATION SCIENCES 2014年 286卷 209-227页

作者： Huang, Zhenhua Xu, Xin Zuo, Lei Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China

value function approximation (VFA) has been a major research topic in reinforcement learning. Although various reinforcement learning algorithms with VFA have been proposed, the performance of most previous algorithms depends on the predefined structure of the basis functions. To address this problem, this paper presents a novel basis learning method for VFA based on isometric feature mapping (IFM). In the proposed method, basis functions for VFA are automatically generated by constructing the optimal embedding basis of the data in a d-dimensional Euclidean space, which best preserves the estimated intrinsic geometry of the manifold. Furthermore, the IFM-based basis learning method is integrated with approximation policy iteration (API) for learning control in Markov decision problems with large state spaces. A new manifold reinforcement learning framework termed IFM-based API (IFM-API) is presented. Three learning control problems, including a real control system of the Googol single inverted pendulum, were studied to evaluate the performance of the proposed IFM-API algorithm. The simulation and experimental results show that, compared with other basis selection or learning methods, the IFM-based basis learning method can automatically compute an efficient set of basis functions with much fewer predefined parameters and less computational costs. Besides, it is illustrated that the proposed IFM-API algorithm can obtain better learning control policies than other API methods. (C) 2014 Elsevier Inc. All rights reserved.

关键词： Reinforcement learning Isometric feature mapping value function approximation Approximate policy iteration Learning control

来源：评论

学校读者我要写书评

暂无评论

The Control of Invasive Species on Private Property with Neighbor-to-Neighbor Spillovers

引用

ENVIRONMENTAL & RESOURCE ECONOMICS 2014年第2期59卷 231-255页

作者： Fenichel, Eli P. Richards, Timothy J. Shanafelt, David W. Yale Univ Sch Forestry & Environm Studies New Haven CT 06511 USA Arizona State Univ Morrison Sch Agribusiness & Resource Management Mesa AZ 85212 USA Arizona State Univ Sch Life Sci Tempe AZ 85287 USA

Invasive pests cross property boundaries. Property managers may have private incentives to control invasive species despite not having sufficient incentive to fully internalize the external costs of their role in spreading the invasion. Each property manager has a right to future use of his own property, but his property may abut others' properties enabling spread of an invasive species. The incentives for a foresighted property manager to control invasive species have received little attention. We consider the efforts of a foresighted property manager who has rights to future use of a property and has the ability to engage in repeated, discrete control activities. We find that higher rates of dispersal, associated with proximity to neighboring properties, reduce the private incentives for control. Controlling species at one location provides incentives to control at a neighboring location. Control at neighboring locations are strategic complements and coupled with spatial heterogeneity lead to a weaker-link public good problem, in which each property owner is unable to fully appropriate the benefits of his own control activity. Future-use rights and private costs suggest that there is scope for a series of Coase-like exchanges to internalize much of the costs associated with species invasion. Pigouvian taxes on invasive species potentially have qualitatively perverse behavioral effects. A tax with a strong income effect (e.g., failure of effective revenue recycling) can reduce the value of property assets and diminish the incentive to manage insects on one's own property.

关键词： Asian citrus psyllid Bioeconomics Citrus Dynamic programming Invasive species Property rights Repeat optimal stopping Spatial externalities value function approximation

来源：评论

学校读者我要写书评

暂无评论

Methods for approximating value functions for the Dominion card game

引用

EVOLUTIONARY INTELLIGENCE 2014年第4期6卷 195-204页

作者： Winder, Ransom K. Mitre Corp 7525 Colshire Dr Mclean VA 22102 USA

Artificial neural networks have been successfully used to approximate value functions for tasks involving decision making. In domains where decisions require a shift in judgment as the overall state changes, it is hypothesized here that methods utilizing multiple artificial neural networks are likely to provide a benefit as an approximation of a value function over those that employ a single network. The card game Dominion was chosen as the domain to examine this. This paper compares artificial neural networks generated by multiple machine learning methods successfully applied to other games (such as in TD-Gammon) to a genetic algorithm method for generating two neural networks for different phases of the game along with evolving the transition point. The results demonstrate a greater success ratio with the genetic algorithm applied to two neural networks. This suggests that future work examining more complex neural network configurations and richer evolutionary exploration could apply to Dominion as well as other domains necessitating shifts in strategy.

关键词： Artificial neural networks Genetic algorithms value function approximation Reinforcement learning Games

来源：评论

学校读者我要写书评

暂无评论

Q-learning in Continuous State-Action Space with Redundant Dimensions by Using a Selective Desensitization Neural Network 7

Q-learning in Continuous State-Action Space with Redundant D...

引用

Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS)

作者： Kobayashi, Takaaki Shibuya, Takeshi Morita, Masahiko Univ Tsukuba Grad Sch Syst & Informat Engn Tsukuba Ibaraki 3058573 Japan Univ Tsukuba Fac Engn Informat & Syst Tsukuba Ibaraki 3058573 Japan

ISBN: (纸本)9781479959556

When applying reinforcement learning algorithms such as Q-learning to real world problems, we must consider the high and redundant dimensions and continuity of the state-action space. The continuity of state-action space is often treated by value function approximation. However, conventional function approximators such as radial basis function networks (RBFNs) are unsuitable in these environments, because they incur high computational cost, and the number of required experiences grows exponentially with the dimension of the state-action space. By contrast, selective desensitization neural network (SDNN) is highly robust to redundant inputs and computes at low computational cost. This paper proposes a novel function approximation method for Q-learning in continuous state-action space based on SDNN. The proposed method is evaluated by numerical experiments with redundant input(s). These experimental results validate the robustness of the proposed method to redundant state dimensions, and its lower computational cost than RBFN. These properties are advantageous to real-world applications such as robotic systems.

关键词： function approximation learning (artificial intelligence) radial basis function networks Q-learning RBFN SDNN continuous state-action space function approximation method redundant dimensions reinforcement learning algorithms robotic systems selective desensitization neural network value function approximation Computational efficiency Educational institutions Electronic mail Joints Learning (artificial intelligence) Neural networks Robustness

来源：评论

学校读者我要写书评

暂无评论

The Operation Optimization Model of Pumped-Hydro Power Storage Station Based on Approximate Dynamic Programming

The Operation Optimization Model of Pumped-Hydro Power Stora...

引用

International Conference on Power System Technology (PowerCon)

作者： Liang, Zhencheng Li, Yu Wei, Hua Guangxi Univ Sch Elect Engn Nanning 530004 Peoples R China Guangxi Key Lab Power Syst Optimizat & Energy Tec Nanning Peoples R China

ISBN: (纸本)9781479950324

Based on the hypothesis that pumped-hydro power storage (PHPS) station is available for multi-day optimization and adjustment, the paper has proposed a long-term operation optimization model of PHPS station based on approximate dynamic programming (ADP). In this multistage decision model, across the stages, value function approximation (VFA) of the reservoir energy storage was used to keep the overall optimization characteristics;During the stages, generated energy & generating periods, and electricity consumption for pumping & pumping periods are used as decision variables to conduct daily optimization operation. The paper got the approximate optimal solution through iteration solution decision variable and value function so as to avoid "curse of dimensionality" in conventional multistage decision model. According to the experiment, the ADP-based model can accurately describe the long-term operation modes of PHPS station, and its calculation methods are more appropriate for this kind of large-scale optimized decision problem than dynamic programming (DP) and conventional mathematic planning methods.

关键词： pumped-hydro power storage station long-term operation optimization value function approximation approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Geodesic Gaussian kernels for value function approximation

引用

AUTONOMOUS ROBOTS 2008年第3期25卷 287-304页

作者： Sugiyama, Masashi Hachiya, Hirotaka Towell, Christopher Vijayakumar, Sethu Tokyo Inst Technol Dept Comp Sci Meguro Ku Tokyo 1528552 Japan Univ Edinburgh Sch Informat Edinburgh EH9 3JZ Midlothian Scotland

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.

关键词： reinforcement learning value function approximation Markov decision process least-squares policy iteration Gaussian kernel

来源：评论

学校读者我要写书评

暂无评论

DYNAMIC PRODUCT POSITIONING IN DIFFERENTIATED PRODUCT MARKETS: THE EFFECT OF FEES FOR MUSICAL PERFORMANCE RIGHTS ON THE COMMERCIAL RADIO INDUSTRY

引用

ECONOMETRICA 2013年第5期81卷 1763-1803页

作者： Sweeting, Andrew Univ Maryland Dept Econ College Pk MD 20742 USA Duke Univ Durham NC 27706 USA NBER Cambridge MA 02138 USA

This article predicts how radio station formats would change if, as was recently proposed, music stations were made to pay fees for musical performance rights. It does so by estimating and solving, using parametric approximations to firms' value functions, a dynamic model that captures important features of the industry such as vertical and horizontal product differentiation, demographic variation in programming tastes, and multi-station ownership. The estimated model predicts that high fees would cause the number of music stations to fall significantly and quite quickly. For example, a fee equal to 10% of revenues would cause a 4.6% drop in the number of music stations within 2 1/2 years, and a 9.4% drop in the long run. The size of the change is limited, however, by the fact that many listeners, particularly in demographics that are valued by advertisers, have strong preferences for music programming.

关键词： Product differentiation dynamic oligopoly value function approximation radio copyright

来源：评论

学校读者我要写书评

暂无评论

EXPERT-BASED REWARD SHAPING AND EXPLORATION SCHEME FOR BOOSTING POLICY LEARNING OF DIALOGUE MANAGEMENT

EXPERT-BASED REWARD SHAPING AND EXPLORATION SCHEME FOR BOOST...

引用

IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)

作者： Ferreira, Emmanuel Lefevre, Fabrice Univ Avignon LIA F-84911 Avignon 9 France

ISBN: (纸本)9781479927562

This paper investigates the conditions under which expert knowledge can be used to accelerate the policy optimization of a learning agent. Recent works on reinforcement learning for dialogue management allowed to devise sophisticated methods for value estimation in order to deal all together with exploration/exploitation dilemma, sample-efficiency and non-stationary environments. In this paper, a reward shaping method and an exploration scheme, both based on some intuitive hand-coded expert advices, are combined with an efficient temporal difference-based learning procedure. The key objective is to boost the initial training stage, when the system is not sufficiently reliable to interact with real users (e. g. clients). Our claims are illustrated by experiments based on simulation and carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS).

关键词： dialogue management reinforcement learning reward shaping value function approximation

来源：评论

学校读者我要写书评

暂无评论

Incremental Sparse Bayesian Method for Online Dialog Strategy Learning

引用

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2012年第8期6卷 903-916页

作者： Lee, Sungjin Eskenazi, Maxine Carnegie Mellon Univ Language Technol Inst Pittsburgh PA 15213 USA

This paper proposes an incremental sparse Bayesian learning method to allow continuous dialog strategy learning from the interactions with real users. Since conventional reinforcement learning (RL) methods require a huge number of dialogs to reach convergence, it has been essential to use a simulated user in training dialog policies. The disadvantage of this approach is that the trained dialog policies always lag behind the optimal one for live users. In order to tackle this problem, a few studies applying online RL methods to dialog management have emerged and showed very promising results. However, these methods are limited to learning online the weight parameters of the basis functions in the model and so need batch learning on a fixed data set or some heuristics to find appropriate values for other meta parameters such as sparsity-controlling thresholds, basis function parameters, and noise parameters. The proposed method attempts to overcome this limitation to achieve fully incremental and fast dialog strategy learning by adopting a sparse Bayesian learning method for value function approximation. In order to verify the proposed method, three different experimental conditions have been used: artificial data, a simulated user, and real users. The experiment on the artificial data showed that the proposed method successfully learns all the parameters in an incremental manner. Also, the experiment on training and evaluating dialog policies with a simulated user clearly demonstrated that the proposed method is much faster than conventional RL methods. A live user study showed that the dialog strategy learned from real users performed as good as the best past systems, although it slightly underperformed the one trained on simulated dialogs due to the difficulty of user feedback elicitation.

关键词： Incremental learning reinforcement learning sparse Bayesian modeling statistical dialog modeling value function approximation

来源：评论

学校读者我要写书评

暂无评论

An Exemplar Test Problem on Parameter Convergence Analysis of Temporal Difference Algorithms

An Exemplar Test Problem on Parameter Convergence Analysis o...

引用

10th World Congress on Intelligent Control and Automation (WCICA)

作者： Brown, Martin Tutsoy, Onder Univ Manchester Control Syst Grp Sch Elect & Elect Engn Manchester M13 9PL Lancs England

ISBN: (纸本)9781467313988

Reinforcement learning techniques have been developed to solve difficult learning control problems having small amount of a priori knowledge about the system dynamics. In this paper, a simple unstable exemplar test problem is proposed to investigate issues in parametric convergence of the value function. A specific closed-form solution for the value function is determined which has a polynomial form. It is proved that the temporal difference error introduces a null space associated with the finite horizon basis function during the control trajectory. The learning problem can be only non-singular if the termination is handled correctly, and a number of possible solutions are introduced. This result was only revealed because of the derived closed form solution for the value function.

关键词： Reinforcement learning temporal difference learning value function approximation polynomial basis functions rate of convergence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：