We formulate the problem of minimizing the operating cost of supplying residential hot water as a discrete-time finite-state Markov decision process. We apply state aggregation to reduce the effective size of the stat...
详细信息
ISBN:
(纸本)9781479919611
We formulate the problem of minimizing the operating cost of supplying residential hot water as a discrete-time finite-state Markov decision process. We apply state aggregation to reduce the effective size of the state space and utilize density estimation to obtain an algorithm that is robust to modeling changes in the cost function. We then use approximate policy iteration to obtain an asymptotically optimal solution to the problem when hot water demand is assumed to be independent of previous demand. We also provide heuristics for solving the problem when the demand is not assumed to be independent of its history. To evaluate the performance of the algorithms, we model the thermodynamics of a 20 gal water heater and simulate hot water demand for a typical two-person household. Test results show that when compared to a regular water heater, the ADP solution can reduce water heating costs by as much as 1/3 while maintaining nearly the same level of comfort. In addition, we discuss how the algorithm can be modified to incorporate solar water heating and we consider the related problem of using residential water heaters for load smoothing.
This paper introduces a workable model for the establishment of an inventory bank holding perishable blood platelets with a short shelf life. The model considers a blood platelet bank with eight blood types, stochasti...
详细信息
This paper introduces a workable model for the establishment of an inventory bank holding perishable blood platelets with a short shelf life. The model considers a blood platelet bank with eight blood types, stochastic demand, stochastic supply, and deterministic lead time. The model is formulated using approximate dynamic programming. The model is evaluated in terms of four measures of effectiveness: blood platelet shortage, outdating, inventory level, and reward gained. Moreover, several alternative inventory control policies are analyzed. The order quantity decision is taken using a news-vendor model. In addition, the variation of the O- percentage is studied. This study confirms that the blood platelet bank reward can be maximized by operating at the optimal inventory level, thereby minimizing the number of outdated units as well as shortages. In addition, the suitable O- percentage within the blood platelet bank inventory was studied. As the O- blood type inventory levels increase to 40%, shortages drop from 3.9% to 1.5%. Outdated units drop from 4.6% to 1.8%. Furthermore, when the order quantity is received twice a day, shortages drop to 1.8% and outdated units drop to 2.1%. (C) 2014 Elsevier Ltd. All rights reserved.
We develop a Markov decision process (MDP) model to examine military medical evacuation (MEDEVAC) dispatch policies. To solve the MDP, we apply an ap- proximate dynamicprogramming (ADP) technique. The problem of deci...
详细信息
We develop a Markov decision process (MDP) model to examine military medical evacuation (MEDEVAC) dispatch policies. To solve the MDP, we apply an ap- proximate dynamicprogramming (ADP) technique. The problem of deciding which aeromedical asset to dispatch to which service request is complicated by the service locations and the priority class of each casualty event. We assume requests for MEDE- VAC arrive sequentially, with the location and the priority of each casualty known upon initiation of the request. The proposed model finds a high quality dispatching policy which outperforms the traditional myopic policy of sending the nearest avail- able unit. Utility is gained by servicing casualties based on both their priority and the actual time until a casualty arrives at a medical treatment facility (MTF). The model is solved using approximate policy iteration (API) and least squares temporal difference (LSTD). Computational examples are used to investigate dispatch policies for a scenario set in northern Syria. Results indicate that a myopic policy is not always the best policy to use for quickly dispatching MEDEVAC units, and insight is gained into the value of specific MEDEVAC locations.
Intermittent electricity generation from renewable sources is characterized by a wide range of fluctuations in frequency spectrum. The medium-frequency component of 0.01 Hz - 1 Hz cannot be filtered out by system iner...
详细信息
ISBN:
(纸本)9781479964154
Intermittent electricity generation from renewable sources is characterized by a wide range of fluctuations in frequency spectrum. The medium-frequency component of 0.01 Hz - 1 Hz cannot be filtered out by system inertia and automatic generation control (AGC) and thus it results in deterioration of frequency quality. In this paper, an approximate dynamic programming (ADP) based supplementary frequency controller for thermal generators is developed to attenuate renewable generation fluctuation in medium-frequency range. A policy iteration based training algorithm is employed for online and model-free learning. Our simulation results demonstrate that the proposed supplementary frequency controller can effectively adapt to changes in the system and provide improved frequency control. Further sensitivity analysis validates that the supplementary frequency controller significantly attenuates the dependence of frequency deviation on the medium-frequency component of renewable generation fluctuation.
Multistate stochastic programs pose some of the more challenging optimization problems. Because such models can become rather intractable in general, it is important to design algorithms that can provide approximation...
详细信息
Multistate stochastic programs pose some of the more challenging optimization problems. Because such models can become rather intractable in general, it is important to design algorithms that can provide approximations which, in the long run, yield solutions that are arbitrarily close to an optimum. In this paper, we propose such a sequential sampling method which is applicable to multistage stochastic linear programs, and we refer to it as the multistage stochastic decomposition (MSD) algorithm. This algorithm represents a dynamic extension of a regularized version of stochastic decomposition (SD). While the method allows general correlation structures, specialized streamlined versions are also possible for special cases of stagewise independent and autoregressive processes commonly incorporated in stochastic programming. As with its two-stage counterpart, the MSD algorithm is shown to provide an asymptotically optimal solution, with probability one. As a by-product of this study, we also show that SD algorithms draw upon features of both approximate dynamic programming as well as stochastic programming.
approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical sol...
详细信息
approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical solution of the recursive Bellman equations. In this paper, low-discrepancy sequences, commonly employed for number-theoretic methods, are investigated as a sampling scheme in the ADP context when local models, such as the Nadaraya Watson (NW) ones, are employed for the approximation of the value function. The analysis is carried out both from a theoretical and a practical point of view. In particular, it is shown that the combined use of low-discrepancy sequences and NW models enables the convergence of the ADP procedure. Then, the regular structure of the low-discrepancy sampling is exploited to derive a method for automatic selection of the bandwidth of NW models, which yields a significant saving in the computational effort with respect to the standard cross validation approach. Simulation results concerning an inventory management problem are presented to show the effectiveness of the proposed techniques. (C) 2013 Elsevier Ltd. All rights reserved.
We consider the use of quadratic approximate value functions for stochastic control problems with input-affine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in s...
详细信息
We consider the use of quadratic approximate value functions for stochastic control problems with input-affine dynamics and convex stage cost and constraints. Evaluating the approximate dynamic programming policy in such cases requires the solution of an explicit convex optimization problem, such as a quadratic program, which can be carried out efficiently. We describe a simple and general method for approximate value iteration that also relies on our ability to solve convex optimization problems, in this case, typically a semidefinite program. Although we have no theoretical guarantee on the performance attained using our method, we observe that very good performance can be obtained in *** (c) 2012 John Wiley & Sons, Ltd.
Developments in robust model predictive control are reviewed from a perspective gained through a personal involvement in the research area during the past two decades. Various min-max MPC formulations are discussed in...
详细信息
Developments in robust model predictive control are reviewed from a perspective gained through a personal involvement in the research area during the past two decades. Various min-max MPC formulations are discussed in the setting of optimizing the "worst-case" performance in closed loop. One of the insights gained is that the conventional open-loop formulation of MPC is fundamentally flawed to address optimal control of systems with uncertain parameters, though it can be tailored to give conservative solutions with robust stability guarantees for special classes of problems. dynamicprogramming (DP) may be the only general framework for obtaining closed-loop optimal control solutions for such systems. Due to the "curse of dimensionality (COD)," however, exact solution of DP is seldom possible. approximate dynamic programming (ADP), which attempts to overcome the COD, is discussed with potential extensions and future challenges. (C) 2013 Elsevier Ltd. All rights reserved.
This study investigates the global optimality of approximate dynamic programming (ADP) based solutions using neural networks for optimal control problems with fixed final time. Issues including whether or not the cost...
详细信息
This study investigates the global optimality of approximate dynamic programming (ADP) based solutions using neural networks for optimal control problems with fixed final time. Issues including whether or not the cost function terms and the system dynamics need to be convex functions with respect to their respective inputs are discussed and sufficient conditions for global optimality of the result are derived. Next, a new idea is presented to use ADP with neural networks for optimization of non-convex smooth functions. It is shown that any initial guess leads to direct movement toward the proximity of the global optimum of the function. This behavior is in contrast with gradient based optimization methods in which the movement is guided by the shape of the local level curves. Illustrative examples are provided with single and multi-variable functions that demonstrate the potential of the proposed method. (C) 2014 Elsevier B.V. All rights reserved.
We formulate the well-known economic lot scheduling problem (ELSP) with sequence-dependent setup times and costs as a semi-Markov decision process. Using an affine approximation of the bias function, we obtain a semi-...
详细信息
We formulate the well-known economic lot scheduling problem (ELSP) with sequence-dependent setup times and costs as a semi-Markov decision process. Using an affine approximation of the bias function, we obtain a semi-infinite linear program determining a lower bound for the minimum average cost rate. Under a very mild condition, we can reduce this problem to a relatively small convex quadratically constrained linear problem by exploiting the structure of the objective function and the state space. This problem is equivalent to the lower bound problem derived by Dobson [Dobson G (1992) The cyclic lot scheduling problem with sequence-dependent setups. Oper. Res. 40:736-749] and reduces to the well-known lower bound problem introduced in Bomberger [Bomberger EE (1966) A dynamicprogramming approach to a lot size scheduling problem. Management Sci. 12: 778-784] for sequence-dependent setups. We thus provide a framework that unifies previous work, and opens new paths for future research on tighter lower bounds and dynamic heuristics.
暂无评论