检索结果-内蒙古大学图书馆

DYNAMIC PRODUCT POSITIONING IN DIFFERENTIATED PRODUCT MARKETS: THE EFFECT OF FEES FOR MUSICAL PERFORMANCE RIGHTS ON THE COMMERCIAL RADIO INDUSTRY

引用

ECONOMETRICA 2013年第5期81卷 1763-1803页

作者： Sweeting, Andrew Univ Maryland Dept Econ College Pk MD 20742 USA Duke Univ Durham NC 27706 USA NBER Cambridge MA 02138 USA

This article predicts how radio station formats would change if, as was recently proposed, music stations were made to pay fees for musical performance rights. It does so by estimating and solving, using parametric approximations to firms' value functions, a dynamic model that captures important features of the industry such as vertical and horizontal product differentiation, demographic variation in programming tastes, and multi-station ownership. The estimated model predicts that high fees would cause the number of music stations to fall significantly and quite quickly. For example, a fee equal to 10% of revenues would cause a 4.6% drop in the number of music stations within 2 1/2 years, and a 9.4% drop in the long run. The size of the change is limited, however, by the fact that many listeners, particularly in demographics that are valued by advertisers, have strong preferences for music programming.

关键词： Product differentiation dynamic oligopoly value function approximation radio copyright

来源：评论

学校读者我要写书评

暂无评论

Adaptive Critic Design with Local Gaussian Process Models

引用

JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2016年第7期20卷 1135-1140页

作者： Wang, Wei Chen, Xin He, Jianxin China Univ Geosci Sch Automat 388 Lumo Rd Wuhan 430074 Peoples R China

In this paper, local Gaussian process (GP) approximation is introduced to build the critic network of adaptive dynamic programming (ADP). The sample data are partitioned into local regions, and for each region, an individual GP model is utilized. The nearest local model is used to predict a given state-action point. With the two-phase value iteration method for a Gaussian-kernel (GK)-based critic network which realizes the update of the hyper-parameters and value functions simultaneously, fast value function approximation can be achieved. Combining this critic network with an actor network, we present a local GK-based ADP approach. Simulations were carried out to demonstrate the feasibility of the proposed approach.

关键词： local Gaussian process adaptive critic design value function approximation two-phase value iteration reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Stochastic home energy management system via approximate dynamic programmingInspec keywordsOther keywords

IET ENERGY SYSTEMS INTEGRATION

引用

IET ENERGY SYSTEMS INTEGRATION 2020年第4期2卷 382-392页

作者： Liu, Xuebo Wu, Hongyu Wang, Li Faqiry, M. Nazif Kansas State Univ Mike Wiegers Dept Elect & Comp Engn Manhattan KS 66506 USA

This study proposes an approximate dynamic programming (ADP) method for a stochastic home energy management system (HEMS) that aims to minimise the electricity cost and discomfort of a household under uncertainties. The study focuses on a HEMS that optimally schedules heating, ventilation, and air conditioning, water heater, and electric vehicle, while accounting for uncertainties in outside temperature, hot water usage, and non-controllable net load. The authors approach the ADP-based HEMS via an effective combination of Sobol sampling backward induction and a K-D tree nearest neighbour techniques for the value function approximation. A subset of possible states is sampled and used to create an approximation of the value of being in aggregated states. They compare the ADP approach with other prevailing HEMS methods, including dynamic programming (DP) and mixed-integer linear programming (MILP), in a model predictive control framework. Simulation results show that the proposed ADP approach can yield near-optimal appliance schedules under uncertainties when finely discretised. Merits and drawbacks of the proposed ADP method in comparison with DP and MILP are also revealed.

关键词： linear programming demand side management integer programming predictive control function approximation dynamic programming energy management systems domestic appliances nearest neighbour methods trees (mathematics) HVAC electric vehicles mixed-integer linear programming ADP approach near-optimal appliance schedules ADP method stochastic home energy management system approximate dynamic programming method electricity cost optimal scheduling water heater electric vehicle hot water usage noncontrollable net load ADP-based HEMS K-D tree nearest neighbour techniques value function approximation HEMS methods heating-ventilation-air conditioning outside temperature Sobol sampling backward induction model predictive control framework

来源：评论

学校读者我要写书评

暂无评论

Balancing resources for dynamic vehicle routing with stochastic customer requests

引用

OR SPECTRUM 2024年第2期46卷 331-373页

作者： Soeffker, Ninja Ulmer, Marlin W. Mattfeld, Dirk C. Univ Vienna Dept Business Decis & Analyt Vienna Austria Otto von Guericke Univ Chair Management Sci Magdeburg Germany Tech Univ Carolo Wilhelmina Braunschweig Decis Support Grp Braunschweig Germany

We consider a service provider performing pre-planned service for initially known customers with a fleet of vehicles, e.g., parcel delivery. During execution, new dynamic service requests occur, e.g., for parcel pickup. The goal of the service provider is to serve as many dynamic requests as possible while ensuring service of all initial customers. The allocation of initial services impacts the potential of serving dynamic requests. An allocation aiming on a time-efficient initial routing leads to minimal overall workload regarding the initial solution but may congest some vehicles that are unable to serve additional requests along their routes. An even workload division is less efficient but grants all vehicles flexibility for additional services. In this paper, we investigate the balance between efficiency and flexibility. For the initial customers, we modify a routing algorithm to allow a shift between efficient initial routing and evenly balanced workloads. For effective dynamic decision making with respect to the dynamic requests, we present value function approximations with different feature sets capturing vehicle workload in different levels of detail. We show that sacrificing some initial routing efficiency in favor of a balanced vehicle workload is a key factor for a flexible integration of later customer requests that leads to an average improvement of 10.75%. Further, we show when explicitly depicting heterogeneity in the vehicle workload by features of the value function approximation provides benefits and that the best choice of features leads to an average improvement of 5.71% compared to the worst feature choice.

关键词： Dynamic vehicle routing Same-day service Approximate dynamic programming value function approximation

来源：评论

学校读者我要写书评

暂无评论

Methods for approximating value functions for the Dominion card game

引用

EVOLUTIONARY INTELLIGENCE 2014年第4期6卷 195-204页

作者： Winder, Ransom K. Mitre Corp 7525 Colshire Dr Mclean VA 22102 USA

Artificial neural networks have been successfully used to approximate value functions for tasks involving decision making. In domains where decisions require a shift in judgment as the overall state changes, it is hypothesized here that methods utilizing multiple artificial neural networks are likely to provide a benefit as an approximation of a value function over those that employ a single network. The card game Dominion was chosen as the domain to examine this. This paper compares artificial neural networks generated by multiple machine learning methods successfully applied to other games (such as in TD-Gammon) to a genetic algorithm method for generating two neural networks for different phases of the game along with evolving the transition point. The results demonstrate a greater success ratio with the genetic algorithm applied to two neural networks. This suggests that future work examining more complex neural network configurations and richer evolutionary exploration could apply to Dominion as well as other domains necessitating shifts in strategy.

关键词： Artificial neural networks Genetic algorithms value function approximation Reinforcement learning Games

来源：评论

学校读者我要写书评

暂无评论

Reducing reinforcement learning to KWIK online regression

引用

ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE 2010年第3-4期58卷 217-237页

作者： Li, Lihong Littman, Michael L. Yahoo Res Santa Clara CA 95054 USA Rutgers State Univ Rutgers Lab Real Life Reinforcement Learning RL3 Dept Comp Sci Piscataway NJ 08854 USA

One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE, a provably efficient, model-free algorithm for finite-horizon RL problems with value function approximation (VFA) that addresses the exploration-exploitation tradeoff in a principled way. The crucial element of this algorithm is a reduction of RL to online regression in the recently proposed KWIK learning model. We show that, if the KWIK online regression problem can be solved efficiently, then the sample complexity of exploration of REKWIRE is polynomial. Therefore, the reduction suggests a new and sound direction to tackle general RL problems. The efficiency of our algorithm is verified on a set of proof-of-concept experiments where popular, ad hoc exploration approaches fail.

关键词： Reinforcement learning Exploration PAC-MDP Knows What It Knows (KWIK) Online regression value function approximation

来源：评论

学校读者我要写书评

暂无评论

Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence 21st

Distributed Gradient Temporal Difference Off-policy Learning...

引用

21st IFAC World Congress on Automatic Control - Meeting Societal Challenges

作者： Stankovic, Milos S. Beko, Marko Stankovic, Srdjan S. Univ Belgrade Innovat Ctr Sch Elect Engn Belgrade Serbia Vlatacom Inst Belgrade Serbia Singidunum Univ Belgrade Serbia Univ Lusafona Humanidades & Tecnol COPELABS Lisbon Portugal Univ Belgrade Sch Elect Engn Belgrade Serbia

In this paper we propose two novel distributed algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms differ in the way of how distributed consensus iterations are incorporated in a basic, recently proposed, single agent scheme. The proposed completely decentralized off-policy learning schemes subsume local eligibility traces, and allow applications in which all the agents may have different behavior policies while evaluating a single target policy. Under nonrestrictive assumptions on the time-varying network topology and the individual state-visiting distributions of the agents, we prove that the parameter estimates of the algorithms weakly converge to a consensus. The variance reduction properties of the proposed algorithms are demonstrated. We also formulate specific guidelines on how to design the network weights and topology. The results are illustrated using simulations. Copyright (C) 2020 The Authors.

关键词： Reinforcement learning Distributed consensus value function approximation Convergence Eligibility traces Off-policy learning Weak convergence Multi-agent systems

来源：评论

学校读者我要写书评

暂无评论

An Exemplar Test Problem on Parameter Convergence Analysis of Temporal Difference Algorithms

An Exemplar Test Problem on Parameter Convergence Analysis o...

引用

10th World Congress on Intelligent Control and Automation (WCICA)

作者： Brown, Martin Tutsoy, Onder Univ Manchester Control Syst Grp Sch Elect & Elect Engn Manchester M13 9PL Lancs England

ISBN: (纸本)9781467313988

Reinforcement learning techniques have been developed to solve difficult learning control problems having small amount of a priori knowledge about the system dynamics. In this paper, a simple unstable exemplar test problem is proposed to investigate issues in parametric convergence of the value function. A specific closed-form solution for the value function is determined which has a polynomial form. It is proved that the temporal difference error introduces a null space associated with the finite horizon basis function during the control trajectory. The learning problem can be only non-singular if the termination is handled correctly, and a number of possible solutions are introduced. This result was only revealed because of the derived closed form solution for the value function.

关键词： Reinforcement learning temporal difference learning value function approximation polynomial basis functions rate of convergence

来源：评论

学校读者我要写书评

暂无评论

Sustainable l₂-Regularized Actor-Critic based on Recursive Least-Squares Temporal Difference Learning

Sustainable <i>l</i><sub>2</sub>-Regularized Actor-Critic ba...

引用

IEEE International Conference on Systems, Man, and Cybernetics (SMC)

作者： Li, Luntong Li, Dazi Song, Tianheng Beijing Univ Chem Technol Inst Automat Beijing 100029 Peoples R China

ISBN: (纸本)9781538616451

Least-squares temporal difference learning (LSTD) has been used mainly for improving the data efficiency of the critic in actor-critic (AC). However, convergence analysis of the resulted algorithms is difficult when policy is changing. In this paper, a new AC method is proposed based on LSTD under discount criterion. The method comprises two components as the contribution: (1) LSTD works in an on-policy way to achieve a good convergence property of AC. (2) A sustainable 2 regularization version of recursive LSTD, which is termed as RRLSTD, is proposed to solve the l(2)-regularization problem of the critic in AC. To reduce the computation complexity of RRLSTD, we propose a fast version that is termed as FRRLSTD. Simulation results show that RRLSTD/FRRLSTD-based AC methods have better learning efficiency and faster convergence rate than conventional AC methods.

关键词： l(2)-regularization actor-critic least-squares temporal difference learning value function approximation reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Online Support Vector Regression based Actor-Critic Method

Online Support Vector Regression based Actor-Critic Method

引用

36th Annual Conference of the IEEE Industrial-Electronics-Society (IECON)/4th IEEE International Conference on E-Learning in Industrial Electronics/IES Industry Forum

作者： Lee, Dong-Hyun Kim, Jeong-Jung Lee, Ju-Jang Korea Adv Inst Sci & Technol Robot Program Taejon 305701 South Korea Korea Adv Inst Sci & Technol Dept Elect Engn Taejon 305701 South Korea

ISBN: (纸本)9781424452262

This paper proposes a new algorithm for actor-critic method using online support vector regression(SVR), which can do incremental learning and automatically track variation of environment with time-varying characteristics. It gives good generalization properties to value function approximation and helps the critic converge fast. In addition, sample vectors in data set of the online SVR are used as center positions of actor's basis functions. Actor updates policy parameters with those functions using policy gradient algorithm. Throughout simulations, the feasibility and usefulness of the proposed method is demonstrated by comparison with other methods.

关键词： actor basis functions automatically track variation function approximation generalisation (artificial intelligence) generalization properties gradient methods incremental learning learning (artificial intelligence) online SVR online support vector regression policy gradient algorithm support vector machines time varying characteristics value function approximation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：