检索结果-内蒙古大学图书馆

Relative entropy in sequential decision problems

JOURNAL OF MATHEMATICAL ECONOMICS 2000年第4期33卷 425-439页

作者： Lehrer, E Smorodinsky, R Tel Aviv Univ Raymond & Beverly Sackler Fac Exact Sci Sch Math IL-69978 Tel Aviv Israel Technion Israel Inst Technol IL-32000 Haifa Israel

Consider an agent who faces a sequential decision problem. At each stage the agent takes an action and observes a stochastic outcome (e.g., daily prices, weather conditions, opponents' actions in a repeated game, etc.). The agent's stage-utility depends on his action, the observed outcome and on previous outcomes. We assume the agent is Bayesian and is endowed with a subjective belief over the distribution of outcomes. The agent's initial belief is typically inaccurate. Therefore, his subjectively optimal strategy is initially suboptimal. As time passes information about the true dynamics is accumulated and, depending on the compatibility of the belief with respect to the truth, the agent may eventually learn to optimize. We introduce the notion of relative entropy, which is a natural adaptation of the entropy of a stochastic process to the subjective set-up. We present conditions, expressed in terms of relative entropy, that determine whether the agent will eventually learn to optimize. It is shown that low entropy yields asymptotic optimal behavior. In addition, we present a notion of pointwise merging and link it with relative entropy. (C) 2000 Elsevier Science S.A. All rights reserved.

关键词： relative entropy sequential decision problems optimization

来源：评论

学校读者我要写书评

暂无评论

Dynamic Programming and Value-Function Approximation in sequential decision problems: Error Analysis and Numerical Results

引用

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS 2013年第2期156卷 380-416页

作者： Gaggero, Mauro Gnecco, Giorgio Sanguineti, Marcello Natl Res Council Italy Inst Intelligent Syst Automat Genoa Italy Univ Genoa DIBRIS Genoa Italy

Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.

关键词： sequential decision problems Dynamic programming Approximation schemes Curse of dimensionality Suboptimal solutions Optimal consumption

来源：评论

学校读者我要写书评

暂无评论

The decision tree polytope and its application to sequential decision problems

引用

Journal of Multi-Criteria decision Analysis 1999年第6期7卷

作者： Art Warburton Faculty of Business Administration Simon Fraser University Burnaby Canada

This paper describes a new mathematical programming approach to sequential decision problems that have an underlying decision tree structure. The approach, based upon a characterization of strategies as extreme points of a 0–1 polytope called the ‘decision tree polytope’, is particularly suited to the direct examination of risk-return and other tradeoffs amongst strategies. However, it can also be used for conventional utility maximization if a utility function is available. Further, the approach requires no algorithmic development—it can be implemented using commercially available algebraic modeling software and can solve large problems. A related, and already known, approach can be used for some more general Markov decision problems. © 1998 John Wiley & Sons, Ltd.

关键词： multiple criterion decision analysis sequential decision problems decision tree polytope integer programming

来源：评论

学校读者我要写书评

暂无评论

Novel pricing strategies for revenue maximization and demand learning using an exploration-exploitation framework

引用

SOFT COMPUTING 2021年第17期25卷 11711-11733页

作者： Elreedy, Dina Atiya, Amir F. Shaheen, Samir I. Cairo Univ Comp Engn Dept Giza 12613 Egypt

The price demand relation is a fundamental concept that models how price affects the sale of a product. It is critical to have an accurate estimate of its parameters, as it will impact the company's revenue. The learning has to be performed very efficiently using a small window of a few test points, because of the rapid changes in price demand parameters due to seasonality and fluctuations. However, there are conflicting goals when seeking the two objectives of revenue maximization and demand learning, known as the learn/earn trade-off. This is akin to the exploration/exploitation trade-off that we encounter in machine learning and optimization algorithms. In this paper, we consider the problem of price demand function estimation, taking into account its exploration-exploitation characteristic. We design a new objective function that combines both aspects. This objective function is essentially the revenue minus a term that measures the error in parameter estimates. Recursive algorithms that optimize this objective function are derived. The proposed method outperforms other existing approaches.

关键词： Revenue management Dynamic pricing Demand learning Exploration-exploitation trade-off Price experimentation sequential decision problems

来源：评论

学校读者我要写书评

暂无评论

Simultaneously Learning and Optimizing Using Controlled Variance Pricing

引用

MANAGEMENT SCIENCE 2014年第3期60卷 770-783页

作者： den Boer, Arnoud V. Zwart, Bert Eindhoven Univ Technol NL-5600 MB Eindhoven Netherlands Univ Amsterdam NL-1098 XH Amsterdam Netherlands Ctr Wiskunde & Informat NL-1098 XG Amsterdam Netherlands Vrije Univ Amsterdam Dept Math NL-1081 HV Amsterdam Netherlands

Price experimentation is an important tool for firms to find the optimal selling price of their products. It should be conducted properly, since experimenting with selling prices can be costly. A firm, therefore, needs to find a pricing policy that optimally balances between learning the optimal price and gaining revenue. In this paper, we propose such a pricing policy, called controlled variance pricing (CVP). The key idea of the policy is to enhance the certainty equivalent pricing policy with a taboo interval around the average of previously chosen prices. The width of the taboo interval shrinks at an appropriate rate as the amount of data gathered gets large;this guarantees sufficient price dispersion. For a large class of demand models, we show that this procedure is strongly consistent, which means that eventually the value of the optimal price will be learned, and derive upper bounds on the regret, which is the expected amount of money lost due to not using the optimal price. Numerical tests indicate that CVP performs well on different demand models and time scales.

关键词： dynamic pricing sequential decision problems statistical learning

来源：评论

学校读者我要写书评

暂无评论

DEGENERACY IN INFINITE HORIZON OPTIMIZATION

引用

MATHEMATICAL PROGRAMMING 1989年第3期43卷 305-316页

作者： RYAN, SM BEAN, JC UNIV MICHIGAN DEPT IND & OPERAT ENGNANN ARBORMI 48109

We consider sequential decision problems over an infinite horizon. The forecast or solution horizon approach to solving such problems requires that the optimal initial decision be unique. We show that multiple optimal initial decisions can exist in general and refer to their existence as degeneracy. We then present a conceptual cost perturbation algorithm for resolving degeneracy and identifying a forecast horizon. We also present a general near-optimal forecast horizon.

关键词： sequential decision problems infinite horizon optimization forecast or solution horizons degeneracy perturbation near-optimal forecast horizon

来源：评论

学校读者我要写书评

暂无评论

Algorithms for Multi-criteria Optimization in Possibilistic decision Trees 14th

Algorithms for Multi-criteria Optimization in Possibilistic ...

引用

14th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU)

作者： Ben Amor, Nahla Essghaier, Fatma Fargier, Helene LARODEC Le Bardo Tunisia IRIT Toulouse France

ISBN: (纸本)9783319615813;9783319615806

This paper raises the question of solving multi-criteria sequential decision problems under uncertainty. It proposes to extend to possibilistic decision trees the decision rules presented in [1] for non sequential problems. It present a series of algorithms for this new framework: Dynamic Programming can be used and provide an optimal strategy for rules that satisfy the property of monotonicity. There is no guarantee of optimality for those that do not-hence the definition of dedicated algorithms. This paper concludes by an empirical comparison of the algorithms.

关键词： Possibility theory sequential decision problems Multi-criteria decision making decision trees

来源：评论

学校读者我要写书评

暂无评论

Marginal Contribution Stochastic Games for Dynamic Resource Allocation 1

引用

17th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA)

作者： Chapman, Archie C. Varakantham, Pradeep Univ Sydney Sch Elect & Informat Engn Sydney NSW 2006 Australia Singapore Management Univ Sch Informat Syst Singapore Singapore

ISBN: (数字)9783319131917

ISBN: (纸本)9783319131917;9783319131900

We develop a new formalism for solving team Markov decision processes (MDPs), called marginal-contribution stochastic games (MCSGs). In MCSGs, each agent's utility for a state transition is given by its marginal contribution to the team value function so that utilities differ between agents, and sparse interaction between them is naturally exploited. We prove that a MCSG admits a potential function and show that the locally optimal solutions, including the global optimum, correspond to the Nash equilibria of the game. We go on to show that any Nash equilibrium of a dynamic resource allocation problem with monotone submodular resource functions in MCSG form has a price of anarchy of > 1/2. Finally, we characterize a class of distributed algorithms for MCSGs.

关键词： Potential games sequential decision problems distributed optimisation

来源：评论

学校读者我要写书评

暂无评论

The Evolution of Strategies for Multiagent Environments

引用

Adaptive Behavior 1992年第1期1卷 65-90页

作者： Grefenstette, John J. Navy Center for Applied Research in Artificial Intelligence Naval Research Laboratory Washington DC 20375-5000 United States

SAMUEL is an experimental learning system that uses genetic algorithms and other learning methods to evolve reactive decision rules from simulations of multiagent environments. The basic approach is to explore a range of behavior within a simulation model, using feedback to adapt its decision strategies over time. One of the main themes in this research is that the learning system should be able to take advantage of existing knowledge where available. This has led to the adoption of rule representations that ease the expression of existing knowledge. A second theme is that adaptation can be driven by competition among knowledge structures. Competition is applied at two levels in SAMUEL. Within a strategy composed of decision rules, rules compete with one another to influence the behavior of the system. At a higher level of granularity, entire strategies compete with one another, driven by a genetic algorithm. This article focuses on recent elaborations of the agent model of SAMUEL that are specifically designed to respond to multiple external agents. Experimental results are presented that illustrate the behavior of SAMUEL on two multiagent predator-prey tasks. © 1992, Sage Publications. All rights reserved.

关键词： genetic algorithms sequential decision problems

来源：评论

学校读者我要写书评

暂无评论

Efficient Learning and Planning Within the Dyna Framework

引用

Adaptive Behavior 1993年第4期1卷 437-454页

作者： Jing, Peng Williams, Ronald J. Northeastern University United States College of Computer Science Northeastern University Boston MA 02115 United States

Sutton’s Dyna framework provides a novel and computationally appealing way to integrate learning, planning, and reacting in autonomous agents. Examined here is a class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. © 1993, Sage Publications. All rights reserved.

关键词： dynamic programming reinforcement learning sequential decision problems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：