检索结果-内蒙古大学图书馆

dynamic Pricing for Network Revenue Management: A New Approach and Application in the Hotel Industry

INFORMS JOURNAL ON COMPUTING 2017年第1期29卷 18-35页

作者： Zhang, Dan Weatherford, Larry Univ Colorado Leeds Sch Business Boulder CO 80309 USA Univ Wyoming Coll Business Laramie WY 82071 USA

dynamic pricing for network revenue management has received considerable attention in research and practice. Based on data obtained from a major hotel, we use a large-scale numerical study to compare the performance of several heuristic approaches proposed in the literature. The heuristic approaches we consider include deterministic linear programming with resolving and three variants of dynamic programming decomposition. dynamic programming decomposition is considered one of the strongest heuristics and is the method chosen in some recent commercial implementations, and remains a topic of research in the recent academic literature. In addition to a plain-vanilla implementation of dynamic programming decomposition, we consider two variants proposed in recent literature. For the base scenario generated from the real data, we show that the method based on Zhang (2011) [An improved dynamic programming decomposition approach for network revenue management. Manufacturing Service Oper. Management 13(1): 35-52.] leads to a small but significant lift in revenue compared with all other approaches. We generate many alternative problem scenarios by varying capacity-demand ratio and network structure and show that the performance of the different heuristics can be strongly influenced by both. Overall, our paper shows the promise of some recent proposals in the academic literature but also offers a cautionary tale on the choice of heuristic methods for practical network pricing problems.

关键词： revenue management dynamic pricing approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

approximate policy iteration for dynamic resource-constrained project scheduling

引用

OPERATIONS RESEARCH LETTERS 2017年第5期45卷 442-447页

作者： Parizi, Mahshid Salemi Gocgun, Yasin Ghate, Archis Univ Washington Ind & Syst Engn BOX 352650 Seattle WA 98195 USA Altinbas Univ Dept Ind Engn Istanbul Turkey

We study non-preemptive scheduling problems where heterogeneous projects stochastically arrive over time. The projects include precedence-constrained tasks that require multiple resources. Incomplete projects are held in queues. When a queue is full, an arriving project must be rejected. The goal is to choose which tasks to start in each time-slot to maximize the infinite-horizon discounted expected profit. We provide a weakly coupled Markov decision process (MDP) formulation and apply a simulation-based approximate policy iteration method. Extensive numerical results are presented. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Markov decision processes approximate dynamic programming Queueing

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Local Value Iteration Adaptive dynamic programming: Admissibility and Termination Analysis

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017年第11期28卷 2490-2502页

作者： Wei, Qinglai Liu, Derong Lin, Qiao Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming local iteration neural networks neurodynamic programming nonlinear systems optimal control

来源：评论

学校读者我要写书评

暂无评论

Data mining for state space orthogonalization in adaptive dynamic programming

引用

EXPERT SYSTEMS WITH APPLICATIONS 2017年 76卷 49-58页

作者： Ariyajunya, Bancha Chen, Ying Chen, Victoria C. P. Kim, Seoung Bum Burapha Univ Fac Engn Chon Buri Thailand Univ Texas Arlington Dept Ind Mfg & Syst Engn Arlington TX 76019 USA Korea Univ Sch Ind Management Engn Seoul South Korea

dynamic programming (DP) is a mathematical programming approach for optimizing a system that changes over time and is a common approach for developing intelligent systems. Expert systems that are intelligent must be able to adapt dynamically over time. An optimal DP policy identifies the optimal decision dependent on the current state of the system. Hence, the decisions controlling the system can intelligently adapt to changing system states. Although DP has existed since Bellman introduced it in 1957, exact DP policies are only possible for problems with low dimension or under very limiting restrictions. Fortunately, advances in computational power have given rise to approximate DP (ADP). However, most ADP algorithms are still computationally-intractable for high-dimensional problems. This paper specifically considers continuous-state DP problems in which the state variables are multicollinear. The issue of multicollinearity is currently ignored in the ADP literature, but in the statistics community it is well known that high multicollinearity leads to unstable (high variance) parameter estimates in statistical modeling. While not all real world DP applications involve high multicollinearity, it is not uncommon for real cases to involve observed state variables that are correlated, such as the air quality ozone pollution application studied in this research. Correlation is a common occurrence in observed data, including sources in meteorology, energy, finance, manufacturing, health care, etc. ADP algorithms for continuous-state DP achieve an approximate solution through discretization of the state space and model approximations. Typical state space discretizations involve full-dimensional grids or random sampling. The former option requires exponential growth in the number of state points as the state space dimension grows, while the latter option is typically inefficient and requires an intractable number of state points. The exception is computationally-tractable

关键词： Data mining Design and analysis of computer experiments approximate dynamic programming Ozone pollution

来源：评论

学校读者我要写书评

暂无评论

A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2017年第1期258卷 216-229页

作者： Goodson, Justin C. Thomas, Barrett W. Ohlmann, Jeffrey W. St Louis Univ John Cook Sch Business Dept Operat & Informat Technol Management 3674 Lindell Blvd St Louis MO 63108 USA Univ Iowa Tippie Coll Business Dept Management Sci Iowa City IA 52242 USA

Rollout algorithms have enjoyed success across a variety of domains as heuristic solution procedures for stochastic dynamic programs (SDPs). However, because most rollout implementations are closely tied to specific problems, the visibility of advances in rollout methods is limited, thereby making it difficult for researchers in other fields to extract general procedures and apply them to different areas. We present a rollout algorithm framework to make recent advances in rollout methods more accessible to researchers seeking heuristic policies for large-scale, finite-horizon SDPs. We formalize rollout variants exploiting the pre- and post-decision state variables as a means of overcoming computational limitations imposed by large state and action spaces. We present a unified analytical discussion, generalizing results from the literature and introducing new results that relate the performance of the rollout variants to one another. Relative to the literature, our policy-based approach to presenting and proving results makes a closer connection to the underpinnings of dynamic programming. Finally, we illustrate our framework and analytical results via application to a dynamic and stochastic multi-compartment knapsack problem. (C) 2016 Published by Elsevier B.V.

关键词： dynamic programming Rollout algorithm Stochastic dynamic programming approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Multiobjective Path-Planning Algorithm With Time Windows for Asset Routing in a dynamic Weather-Impacted Environment

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2017年第12期47卷 3256-3271页

作者： Sidoti, David Avvari, Gopi Vinod Mishra, Manisha Zhang, Lingyi Nadella, Bala Kishore Peak, James E. Hansen, James A. Pattipati, Krishna R. Univ Connecticut Dept Elect & Comp Engn Storrs CT 06269 USA Doran Jones Bronx NY 10454 USA US Naval Res Lab NRL MRY Monterey CA 93943 USA

This paper presents a mixed-initiative tool for multiobjective planning and asset routing (TMPLAR) in dynamic and uncertain environments. TMPLAR is built upon multiobjective dynamic programming algorithms to route assets in a timely fashion, while considering fuel efficiency, voyage time, distance, and adherence to real world constraints (asset vehicle limits, navigator-specified deadlines, etc.). TMPLAR has the potential to be applied in a variety of contexts, including ship, helicopter, or unmanned aerial vehicle routing. The tool provides recommended schedules, consisting of waypoints, associated arrival and departure times, asset speed and bearing, that are optimized with respect to several objectives. The ship navigation is exacerbated by the need to address multiple conflicting objectives, spatial and temporal uncertainty associated with the weather, multiple constraints on asset operation, and the added capability of waiting at a waypoint with the intent to avoid bad weather, conduct opportunistic training drills, or both. The key algorithmic contribution is a multiobjective shortest path algorithm for networks with stochastic nonconvex edge costs and the following problem features: 1) time windows on nodes;2) ability to choose vessel speed to next node subject to (minimum and/or maximum) speed constraints;3) ability to select the power plant configuration at each node;and 4) ability to wait at a node. The algorithm is demonstrated on six real world routing scenarios by comparing its performance against an existing operational routing algorithm.

关键词： approximate dynamic programming label setting meteorology oceanography Pareto optimal ship routing shortest path problem with time windows uncertainty weather

来源：评论

学校读者我要写书评

暂无评论

Anticipatory freight selection in intermodal long-haul round-trips

引用

TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW 2017年 105卷 176-194页

作者： Rivera, Arturo E. Perez Mes, Martijn R. K. Univ Twente Dept Ind Engn & Business Informat Syst POB 217 NL-7500 AE Enschede Netherlands

We consider the planning problem faced by Logistic Service Providers (LSPs) transporting freights periodically, using long-haul round-trips. In each round-trip, freights are delivered and picked up at different locations within one region. Freights have time-windows and become known gradually over time. Using probabilistic knowledge about future freights, the LSP's objective is to minimize costs over a multi-period horizon. We propose a look ahead planning method using approximate dynamic programming. Experiments show that our approach reduces costs up to 25.5% compared to a single-period optimization approach. We provide managerial insights for several intermodal long-haul round-trips settings and provide directions for further research. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： Intermodal transport Synchromodal planning Long-haul consolidation Anticipatory shipping approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

引用

MATHEMATICS OF OPERATIONS RESEARCH 2017年第3期42卷 762-782页

作者： Wen, Zheng Van Roy, Benjamin Adobe Res San Jose CA 95110 USA Stanford Univ Stanford CA 94305 USA

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within a given hypothesis class, OCP selects optimal actions over all but at most D episodes, where D is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis class, for the special case where the hypothesis class is the span of prespecified indicator functions over disjoint sets. We also discuss the computational complexity of OCP and present computational results involving two illustrative examples.

关键词： reinforcement learning efficient exploration value function generalization approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Error-Tolerant Iterative Adaptive dynamic programming for Optimal Renewable Home Energy Scheduling and Battery Management

引用

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2017年第12期64卷 9527-9537页

作者： Wei, Qinglai Lewis, Frank L. Shi, Guang Song, Ruizhuo Chinese Acad Sci State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Univ Texas Arlington Res Inst Arlington TX 76118 USA Northeastern Univ Shenyang 110036 Liaoning Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a novel error-tolerant iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal battery control and management problems in smart home environments with renewable energy. A main contribution for the iterative ADP algorithm is to implement with the electricity rate, home load demand, and renewable energy as quasi-periodic functions, instead of accurate periodic functions, where the discount factor can adaptively be regulated in each iteration to guarantee the convergence of the iterative value function. A new analysis method is developed to guarantee the iterative value function to converge to a finite neighborhood of the optimal performance index function, in spite of the differences of the electricity rate, the home load demand, and the renewable energy in different periods. Neural networks are employed to approximate the iterative value function and control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Numerical results and comparisons are given to illustrate the performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming home energy systems optimal control smart grid

来源：评论

学校读者我要写书评

暂无评论

Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017年第3期28卷 704-713页

作者： Song, Ruizhuo Lewis, Frank L. Wei, Qinglai Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Univ Texas Arlington UTA Res Inst Arlington TX 76019 USA Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming integral reinforcement learning (IRL) nonlinear systems nonzero sum (NZS) off-policy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：