检索结果-内蒙古大学图书馆

A cost-shaping linear program for average-cost approximate dynamic programming with performance guarantees

MATHEMATICS OF OPERATIONS RESEARCH 2006年第3期31卷 597-620页

作者： de Farias, Daniela Pucci Van Roy, Benjamin MIT Dept Mech Engn Cambridge MA 02139 USA Stanford Univ Dept Management Sci & Engn Stanford CA 94305 USA Stanford Univ Dept Elect Engn Stanford CA 94305 USA

We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear combination of basis functions. We establish a bound on the performance of the resulting policy that scales gracefully with the number of states without imposing the strong Lyapunov condition required by its counterpart in de Farias and Van Roy (de Farias, D. R, B. Van Roy. 2003. The linear programming approach to approximate dynamic programming. Oper Res. 51(6) 850-865]. We investigate implications of this result in the context of a queueing control problem.

关键词： approximate dynamic programming linear programming average cost

来源：评论

学校读者我要写书评

暂无评论

Adaptive dynamic programming for Finite-Horizon Optimal Tracking Control of a Class of Nonlinear Systems

Adaptive Dynamic Programming for Finite-Horizon Optimal Trac...

引用

第三十届中国控制会议

作者： WANG Ding,LIU Derong,WEI Qinglai Key Laboratory of Complex Systems and Intelligence Science,Institute of Automation,Chinese Academy of Sciences, Beijing 100190,P.R.China

This paper deals with the flnite-horizon optimal tracking control for a class of discrete-time nonlinear systems using the iterative adaptive dynamic programming(ADP) ***,the optimal tracking problem is converted into designing a flnite-horizon optimal regulator for the tracking error ***,with convergence analysis in terms of cost function and control law,the iterative ADP algorithm via heuristic dynamic programming(HDP) technique is introduced to obtain the flnite-horizon optimal tracking controller which makes the cost function close to its optimal value within an e-error ***, three neural networks are used to implement the algorithm,which aims at approximating the cost function,the control law,and the error dynamics,*** last,an example is included to demonstrate the effectiveness of the proposed approach.

关键词： Adaptive critic designs Adaptive dynamic programming approximate dynamic programming Finite-horizon optimal tracking control Learning control Neural control

来源：评论

学校读者我要写书评

暂无评论

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

引用

JOURNAL OF PROCESS CONTROL 2006年第2期16卷 135-156页

作者： Lee, JM Kaisare, NS Lee, JH Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the 'curse-of-dimensionality' of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. 'cost-to-go' data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose. (c) 2005 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming k-nearest neighbor neural network

来源：评论

学校读者我要写书评

暂无评论

An approximate dynamic programming approach to convex quadratic knapsack problems

引用

COMPUTERS & OPERATIONS RESEARCH 2006年第3期33卷 660-673页

作者： Hua, ZS Zhang, B Liang, L Univ Sci & Technol China Sch Business Hefei 230026 Anhui Peoples R China

Quadratic knapsack problem (QKP) has a central role in integer and combinatorial optimization, while efficient algorithms to general QKPs are currently very limited. We present an approximate dynamic programming (ADP) approach for solving convex QKPs where variables may take any integer value and all coefficients are real numbers. We approximate the function value using (a) continuous quadratic programming relaxation (CQPR), and (b) the integral parts of the solutions to CQPR. We propose a new heuristic which adaptively fixes the variables according to the solution of CQPR. We report computational results for QKPs with up to 200 integer variables. Our numerical results illustrate that the new heuristic produces high-quality solutions to large-scale QKPs fast and robustly. (c) 2004 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming quadratic knapsack problem heuristics

来源：评论

学校读者我要写书评

暂无评论

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

引用

MACHINE LEARNING 2006年第1期65卷 167-198页

作者： George, Abraham P. Powell, Warren B. Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.

关键词： stochastic stepsize adaptive learning approximate dynamic programming Kalman filter

来源：评论

学校读者我要写书评

暂无评论

approximate dynamic programming methods for an inventory allocation problem under uncertainty

引用

NAVAL RESEARCH LOGISTICS 2006年第8期53卷 822-841页

作者： Topaloglu, Huseyin Kunnumkal, Sumit Cornell Univ Sch Operat Res & Ind Engn Ithaca NY 14853 USA

We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations for sale. We begin by formulating the problem as a dynamic program. Our first approximate dynamic programming method uses a linear approximation of the value function and computes the parameters of this approximation by using the linear programming representation of the dynamic program. Our second method relaxes the constraints that link the decisions for different production plants. Consequently, the dynamic program decomposes by the production plants. Computational experiments show that the proposed methods are computationally attractive, and in particular, the second method performs significantly better than standard benchmarks. (C) 2006 Wiley Periodicals, Inc.

关键词： inventory dynamic programming approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Optimal and approximate algorithms for sequential clinical scheduling with no-shows

引用

IIE Transactions on Healthcare Systems Engineering 2011年第1期1卷 20-36页

作者： Lin, J. Muthuraman, Kumar Lawley, Mark Weldon School of Biomedical Engineering Purdue University West Lafayette IN 47907-2032 206 S. Martin Jischke Drive United States McCombs School of Business University of Texas Austin TX United States

The accessibility and efficiency of outpatient clinic operations are largely affected by appointment schedules. Clinical scheduling is a process of assigning physician appointment times to sequentially calling patients. A significant problem in clinical operations is patient no-show, that is, scheduled patients not showing for their appointments. Overbooking can compensate revenue loss due to no-show, but naive overbooking can result in longer patient waiting times and uneven physician work loads. In the past few years, new overbooking methods have been developed for sequential scheduling that yield higher expected profit than simple scheduling rules, but these often fail to exploit information about the future call-in process (they are myopic). To fully use this important information, we develop a Markov Decision Processes (MDP) model for sequential clinical scheduling that books patients to optimize the performance of clinic operations. The model is solved by dynamic programming (DP) for small problems. approximate dynamic programming (ADP) algorithms based on aggregation and simulation are developed to find schedules for larger problems. Our computational experiments indicate good improvement over myopic methods. © 2011 Copyright Taylor and Francis Group, LLC.

关键词： approximate dynamic programming Markov Decision Process outpatient clinics patient no-shows Sequential scheduling

来源：评论

学校读者我要写书评

暂无评论

Probabilistic modeling and dynamic optimization for performance improvement and risk management of plant-wide operation

引用

COMPUTERS & CHEMICAL ENGINEERING 2010年第4期34卷 567-579页

作者： Yang, Yu Lee, Jong Min Univ Alberta Edmonton AB T6G 2G6 Canada

This study presents a novel algorithm for constructing a probabilistic model based on historical operation data and performing dynamic optimization for plant-wide control applications. The proposed approach consists of applying a self-organizing map (SOM) for identifying representative plant operation modes based on a discounted infinite horizon cost and approximate dynamic programming techniques for learning an optimal policy A quantitative measure for risk is defined in terms of transition probability. and a systematic guideline for striking balance between risk and profit in decision making is provided with a mathematical proof. The efficacy of the proposed approach is illustrated on an integrated plant consisting of a reactor, a storage tank. and a separator with a recycle loop and Tennessee Eastman challenge problem The algorithm is useful for learning an improved policy and reducing risk in plant operation when a plant-wide model is difficult to obtain and uncertainties affect operation performance significantly (C) 2009 Elsevier Ltd All rights reserved

关键词： approximate dynamic programming Real-time optimization Multi-stage stochastic optimization Plant-wide control

来源：评论

学校读者我要写书评

暂无评论

Computing Time-Dependent Bid Prices in Network Revenue Management Problems

引用

TRANSPORTATION SCIENCE 2010年第1期44卷 38-62页

作者： Kunnumkal, Sumit Topaloglu, Huseyin Indian Sch Business Hyderabad 500032 Andhra Pradesh India Cornell Univ Sch Operat Res & Informat Engn Ithaca NY 14853 USA

We propose a new method to compute bid prices in network revenue management problems. The novel aspect of our method is that it naturally provides dynamic bid prices that depend on how much time is left until departure. We show that our method provides an upper bound on the optimal total expected revenue and that this upper bound is tighter than the one provided by the widely known deterministic linear programming approach. Furthermore, it is possible to use the bid prices computed by our method as a starting point in a dynamic programming decomposition-like idea to decompose the network revenue management problem by the flight legs and to obtain dynamic and capacity-dependent bid prices. Our computational experiments indicate that the proposed method improves on many standard benchmarks.

关键词： network revenue management approximate dynamic programming Lagrangian relaxation

来源：评论

学校读者我要写书评

暂无评论

Intra market optimization for express package carriers with station to station travel and proportional sorting

引用

COMPUTERS & OPERATIONS RESEARCH 2010年第10期37卷 1749-1761页

作者： Schenk, Luke Klabjan, Diego Northwestern Univ Dept Ind Engn & Management Sci Evanston IL 60208 USA Cap Geminini US Chicago IL USA

The flow of packages of an express package carrier consists of pick ups at costumer locations by couriers and delivering the packages to a local station for sorting. The packages are then transported to a major regional sorting facility called the ramp. At the ramp, packages can be sorted again before departing to a hub. From the hub they are moved to the destination ramp, where the entire process repeats in the reverse order until ultimate delivery of the package to the end customer. We focus on the afternoon and evening operations concerning stations and the ramp. Sorting and transportation decisions among these locations are considered. The most important decisions are: (1) which packages to aggregate at the stations, and (2) what is the most efficient transportation among locations to meet time deadlines at the ramp. Several options for modeling the sorting process at stations and the ramp, as well as the possibility of vehicles traveling from one station to another station to consolidate volume before proceeding to the ramp are considered. We model these processes by means of a dynamic program, where time periods represent time slices in the afternoon and evening. The overall model is solved by approximate dynamic programming, where the value function is approximated by a linear function. Further strategies are developed to speed up the algorithm and decrease the time needed to find feasible solutions. The methodology is tested on several instances from an express package carrier. The dynamic program solutions are substantially better than the current best practice and the best solutions obtained from an integer programming formulation of the problem. (C) 2010 Elsevier Ltd. All rights reserved.

关键词： Logistics approximate dynamic programming Large-scale optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：