This article considers a quite general dynamic capacity allocation problem. There is a fixed amount of daily processing capacity. On each day, jobs of different priorities arrive randomly and a decision has to made ab...
详细信息
This article considers a quite general dynamic capacity allocation problem. There is a fixed amount of daily processing capacity. On each day, jobs of different priorities arrive randomly and a decision has to made about which jobs should be scheduled on which days. Waiting jobs incur a holding cost that is a function of their priority levels. The objective is to minimize the total expected cost over a finite planning horizon. The problem is formulated as a dynamic program, but this formulation is computationally difficult as it involves a high-dimensional state vector. To address this difficulty, an approximate dynamic programming approach is used that decomposes the dynamicprogramming formulation by the different days in the planning horizon to construct separable approximations to the value functions. Value function approximations are used for two purposes. First, it is shown that the value function approximations can be used to obtain a lower bound on the optimal total expected cost. Second, the value function approximations can be used to make the job scheduling decisions over time. Computational experiments indicate that the job scheduling decisions made by the proposed approach perform significantly better than a variety of benchmark strategies.
Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamic prog...
详细信息
Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamicprogramming are the standard methods to solve MDPs, their unwieldy computational requirements limit their usefulness in real applications. approximate dynamic programming (ADP) combines simulation and function approximation to alleviate the 'curse-of-dimensionality' associated with the traditional dynamicprogramming approach. In this paper, we present the ADP as a viable way to solve MDPs for process control and scheduling problems. We bring forth some key issues for its successful application in these types of problems, including the choice of function approximator and the use of a penalty function to guard against over-extending the value function approximation in the value iteration. Application studies involving a number of well-known control and scheduling problems, including dual control, multiple controller scheduling, and resource constrained project scheduling problems, point to the promising potentials of ADP. (c) 2006 Elsevier Ltd. All rights reserved.
Multi-homing is used by Internet Service Providers (ISPs) to connect to the Internet via different network providers. This study develops a routing strategy under multi-homing in the case where network providers charg...
详细信息
Multi-homing is used by Internet Service Providers (ISPs) to connect to the Internet via different network providers. This study develops a routing strategy under multi-homing in the case where network providers charge ISPs according to top-percentile pricing (i.e. based on the 0th highest volume of traffic shipped). We call this problem the Top-percentile Traffic Routing Problem (TpTRP). Solution approaches based on Stochastic dynamicprogramming require discretization in state space, which introduces a large number of state variables. This is known as the curse of dimensionality in state space. To overcome this, in previous work we have suggested to use approximate dynamic programming (ADP) to construct value function approximations, which allow us to work in continuous state space. The resulting ADP model provides well performing routing policies for medium sized instances of the TpTRP. In this work we extend the ADP model, by using Bezier Curves/Surfaces to obtain continuous-time approximations of the time-dependent ADP parameters. This modification reduces the number of regression parameters to estimate, and thus accelerates the efficiency of parameter training in the solution of the ADP model, which makes realistically sized TpTRP instances tractable. We argue that our routing strategy is near optimal by giving bounds. (C) 2011 Elsevier B.V. All rights reserved.
We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations...
详细信息
We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations for sale. We begin by formulating the problem as a dynamic program. Our first approximate dynamic programming method uses a linear approximation of the value function and computes the parameters of this approximation by using the linear programming representation of the dynamic program. Our second method relaxes the constraints that link the decisions for different production plants. Consequently, the dynamic program decomposes by the production plants. Computational experiments show that the proposed methods are computationally attractive, and in particular, the second method performs significantly better than standard benchmarks. (C) 2006 Wiley Periodicals, Inc.
This study investigates the application of learning-based and simulation-based approximate dynamic programming (ADP) approaches to an inventory problem under the Generalized Autoregressive Conditional Heteroscedastici...
详细信息
This study investigates the application of learning-based and simulation-based approximate dynamic programming (ADP) approaches to an inventory problem under the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model. Specifically, we explore the robustness of a learning-based ADP method, Sarsa, with a GARCH(1,1) demand model, and provide empirical comparison between Sarsa and two simulation-based ADP methods: Rollout and Hindsight Optimization (HO). Our findings assuage a concern regarding the effect of GARCH(1,1) latent state variables on learning-based ADP and provide practical strategies to design an appropriate ADP method for inventory problems. In addition, we expose a relationship between ADP parameters and conservative behavior. Our empirical results are based on a variety of problem settings, including demand correlations, demand variances, and cost structures. (C) 2011 Elsevier Ltd. All rights reserved.
This paper introduces a workable model for the establishment of an inventory bank holding perishable blood platelets with a short shelf life. The model considers a blood platelet bank with eight blood types, stochasti...
详细信息
This paper introduces a workable model for the establishment of an inventory bank holding perishable blood platelets with a short shelf life. The model considers a blood platelet bank with eight blood types, stochastic demand, stochastic supply, and deterministic lead time. The model is formulated using approximate dynamic programming. The model is evaluated in terms of four measures of effectiveness: blood platelet shortage, outdating, inventory level, and reward gained. Moreover, several alternative inventory control policies are analyzed. The order quantity decision is taken using a news-vendor model. In addition, the variation of the O- percentage is studied. This study confirms that the blood platelet bank reward can be maximized by operating at the optimal inventory level, thereby minimizing the number of outdated units as well as shortages. In addition, the suitable O- percentage within the blood platelet bank inventory was studied. As the O- blood type inventory levels increase to 40%, shortages drop from 3.9% to 1.5%. Outdated units drop from 4.6% to 1.8%. Furthermore, when the order quantity is received twice a day, shortages drop to 1.8% and outdated units drop to 2.1%. (C) 2014 Elsevier Ltd. All rights reserved.
This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamicprogramming (ADP). A policy iteration algorithm is derived in the presence of both additive and mul...
详细信息
This brief studies the stochastic optimal control problem via reinforcement learning and approximate/adaptive dynamicprogramming (ADP). A policy iteration algorithm is derived in the presence of both additive and multiplicative noise using Ito calculus. The expectation of the approximated cost matrix is guaranteed to converge to the solution of some algebraic Riccati equation that gives rise to the optimal cost value. Moreover, the covariance of the approximated cost matrix can be reduced by increasing the length of time interval between two consecutive iterations. Finally, a numerical example is given to illustrate the efficiency of the proposed ADP methodology.
In this paper, we propose an approximate dynamic programming approach for an energy-efficient unrelated parallel machine scheduling problem. In this scheduling problem, jobs arrive at the system randomly, and each job...
详细信息
In this paper, we propose an approximate dynamic programming approach for an energy-efficient unrelated parallel machine scheduling problem. In this scheduling problem, jobs arrive at the system randomly, and each job's ready and processing times become available when an order is placed. Therefore, we consider the online version of the problem. Our objective is to minimize a combination of makespan and the total energy costs. The energy costs include cost of energy consumption of machines for switching on, processing, and idleness. We propose a binary program to solve the optimization problem at each stage of the approximatedynamic program. We compare the results of the approximateprogramming approach against an integer linear programming formulation of the offline version of the scheduling problem and an existing heuristic method suitable for scheduling problem with ready times. The results show that the approximate dynamic programming algorithm outperforms the two off-line methods in terms of solution quality and computational time. (c) 2021 Elsevier B.V. All rights reserved.
Given the ubiquitous nature of both offensive and defensive missile systems, the catastrophe-causing potential they represent, and the limited resources available to countries for missile defense, optimizing the defen...
详细信息
Given the ubiquitous nature of both offensive and defensive missile systems, the catastrophe-causing potential they represent, and the limited resources available to countries for missile defense, optimizing the defensive response to a missile attack is a necessary national security endeavor. For a single salvo of offensive missiles launched at a set of targets, a missile defense system protecting those targets must determine how many interceptors to fire at each incoming missile. Since such missile engagements often involve the firing of more than one attack salvo, we develop a Markov decision process (MDP) model to examine the optimal fire control policy for the defender. Due to the computational intractability of using exact methods for all but the smallest problem instances, we utilize an approximate dynamic programming (ADP) approach to explore the efficacy of applying approximate methods to the problem. We obtain policy insights by analyzing subsets of the state space that reflect a range of possible defender interceptor inventories. Testing of four instances derived from a representative planning scenario demonstrates that the ADP policy provides high -quality decisions for a majority of the state space, achieving a 7.74% mean optimality gap over all states for the most realistic instance, modeling a longer -term engagement by an attacker who assesses the success of each salvo before launching a subsequent one. Moreover, the ADP algorithm requires only a few minutes of computational effort versus hours for the exact dynamicprogramming algorithm, providing a method to address more complex and realistically-sized instances. Published by Elsevier B.V.
Military medical planners must consider the dispatching of aerial military medical evacuation (MEDEVAC) assets when preparing for and executing major combat operations. The launch authority seeks to dispatch MEDEVAC a...
详细信息
Military medical planners must consider the dispatching of aerial military medical evacuation (MEDEVAC) assets when preparing for and executing major combat operations. The launch authority seeks to dispatch MEDEVAC assets such that prioritized battlefield casualties are transported quickly and efficiently to nearby medical treatment facilities. We formulate a Markov decision process (MDP) model to examine the MEDEVAC dispatching problem. The large size of the problem instance motivating this research suggests that conventional exact dynamicprogramming algorithms are inappropriate. As such, we employ approximate dynamic programming (ADP) techniques to obtain high quality dispatch policies relative to current practices. An approximate policy iteration algorithmic strategy is applied that utilizes least squares temporal differencing for policy evaluation. We construct a representative planning scenario based on contingency operations in northern Syria both to demonstrate the applicability of our MDP model and to examine the efficacy of our proposed ADP solution methodology. A designed computational experiment is conducted to determine how selected problem features and algorithmic features affect the quality of solutions attained by our ADP policies. Results indicate that the ADP policy outperforms the myopic policy (i.e., the default policy in practice) by up to nearly 31% with regard to a lifesaving performance metric, as compared for a baseline scenario. Moreover, the ADP policy provides decreased MEDEVAC response times and utilization rates. These results benefit military medical planners interested in the development and implementation of cogent MEDEVAC tactics, techniques, and procedures for application in combat situations with a high operations tempo. Published by Elsevier B.V.
暂无评论