We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear com...
详细信息
We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear combination of basis functions. We establish a bound on the performance of the resulting policy that scales gracefully with the number of states without imposing the strong Lyapunov condition required by its counterpart in de Farias and Van Roy (de Farias, D. R, B. Van Roy. 2003. The linear programming approach to approximate dynamic programming. Oper Res. 51(6) 850-865]. We investigate implications of this result in the context of a queueing control problem.
This paper deals with the flnite-horizon optimal tracking control for a class of discrete-time nonlinear systems using the iterative adaptive dynamicprogramming(ADP) ***,the optimal tracking problem is converted into...
详细信息
This paper deals with the flnite-horizon optimal tracking control for a class of discrete-time nonlinear systems using the iterative adaptive dynamicprogramming(ADP) ***,the optimal tracking problem is converted into designing a flnite-horizon optimal regulator for the tracking error ***,with convergence analysis in terms of cost function and control law,the iterative ADP algorithm via heuristic dynamicprogramming(HDP) technique is introduced to obtain the flnite-horizon optimal tracking controller which makes the cost function close to its optimal value within an e-error ***, three neural networks are used to implement the algorithm,which aims at approximating the cost function,the control law,and the error dynamics,*** last,an example is included to demonstrate the effectiveness of the proposed approach.
This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation...
详细信息
This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the 'curse-of-dimensionality' of the traditional dynamicprogramming approach. In ADP, one fits a function approximator to state vs. 'cost-to-go' data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose. (c) 2005 Elsevier Ltd. All rights reserved.
Quadratic knapsack problem (QKP) has a central role in integer and combinatorial optimization, while efficient algorithms to general QKPs are currently very limited. We present an approximate dynamic programming (ADP)...
详细信息
Quadratic knapsack problem (QKP) has a central role in integer and combinatorial optimization, while efficient algorithms to general QKPs are currently very limited. We present an approximate dynamic programming (ADP) approach for solving convex QKPs where variables may take any integer value and all coefficients are real numbers. We approximate the function value using (a) continuous quadratic programming relaxation (CQPR), and (b) the integral parts of the solutions to CQPR. We propose a new heuristic which adaptively fixes the variables according to the solution of CQPR. We report computational results for QKPs with up to 200 integer variables. Our numerical results illustrate that the new heuristic produces high-quality solutions to large-scale QKPs fast and robustly. (c) 2004 Elsevier Ltd. All rights reserved.
We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for ...
详细信息
We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamicprogramming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.
We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations...
详细信息
We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations for sale. We begin by formulating the problem as a dynamic program. Our first approximate dynamic programming method uses a linear approximation of the value function and computes the parameters of this approximation by using the linear programming representation of the dynamic program. Our second method relaxes the constraints that link the decisions for different production plants. Consequently, the dynamic program decomposes by the production plants. Computational experiments show that the proposed methods are computationally attractive, and in particular, the second method performs significantly better than standard benchmarks. (C) 2006 Wiley Periodicals, Inc.
The accessibility and efficiency of outpatient clinic operations are largely affected by appointment schedules. Clinical scheduling is a process of assigning physician appointment times to sequentially calling patient...
详细信息
This study presents a novel algorithm for constructing a probabilistic model based on historical operation data and performing dynamic optimization for plant-wide control applications. The proposed approach consists o...
详细信息
This study presents a novel algorithm for constructing a probabilistic model based on historical operation data and performing dynamic optimization for plant-wide control applications. The proposed approach consists of applying a self-organizing map (SOM) for identifying representative plant operation modes based on a discounted infinite horizon cost and approximate dynamic programming techniques for learning an optimal policy A quantitative measure for risk is defined in terms of transition probability. and a systematic guideline for striking balance between risk and profit in decision making is provided with a mathematical proof. The efficacy of the proposed approach is illustrated on an integrated plant consisting of a reactor, a storage tank. and a separator with a recycle loop and Tennessee Eastman challenge problem The algorithm is useful for learning an improved policy and reducing risk in plant operation when a plant-wide model is difficult to obtain and uncertainties affect operation performance significantly (C) 2009 Elsevier Ltd All rights reserved
We propose a new method to compute bid prices in network revenue management problems. The novel aspect of our method is that it naturally provides dynamic bid prices that depend on how much time is left until departur...
详细信息
We propose a new method to compute bid prices in network revenue management problems. The novel aspect of our method is that it naturally provides dynamic bid prices that depend on how much time is left until departure. We show that our method provides an upper bound on the optimal total expected revenue and that this upper bound is tighter than the one provided by the widely known deterministic linear programming approach. Furthermore, it is possible to use the bid prices computed by our method as a starting point in a dynamicprogramming decomposition-like idea to decompose the network revenue management problem by the flight legs and to obtain dynamic and capacity-dependent bid prices. Our computational experiments indicate that the proposed method improves on many standard benchmarks.
The flow of packages of an express package carrier consists of pick ups at costumer locations by couriers and delivering the packages to a local station for sorting. The packages are then transported to a major region...
详细信息
The flow of packages of an express package carrier consists of pick ups at costumer locations by couriers and delivering the packages to a local station for sorting. The packages are then transported to a major regional sorting facility called the ramp. At the ramp, packages can be sorted again before departing to a hub. From the hub they are moved to the destination ramp, where the entire process repeats in the reverse order until ultimate delivery of the package to the end customer. We focus on the afternoon and evening operations concerning stations and the ramp. Sorting and transportation decisions among these locations are considered. The most important decisions are: (1) which packages to aggregate at the stations, and (2) what is the most efficient transportation among locations to meet time deadlines at the ramp. Several options for modeling the sorting process at stations and the ramp, as well as the possibility of vehicles traveling from one station to another station to consolidate volume before proceeding to the ramp are considered. We model these processes by means of a dynamic program, where time periods represent time slices in the afternoon and evening. The overall model is solved by approximate dynamic programming, where the value function is approximated by a linear function. Further strategies are developed to speed up the algorithm and decrease the time needed to find feasible solutions. The methodology is tested on several instances from an express package carrier. The dynamic program solutions are substantially better than the current best practice and the best solutions obtained from an integer programming formulation of the problem. (C) 2010 Elsevier Ltd. All rights reserved.
暂无评论