We describe a mathematical decision model for identifying dynamic health policies for controlling epidemics. These dynamic policies aim to select the best current intervention based on accumulating epidemic data and t...
详细信息
We describe a mathematical decision model for identifying dynamic health policies for controlling epidemics. These dynamic policies aim to select the best current intervention based on accumulating epidemic data and the availability of resources at each decision point. We propose an algorithm to approximatedynamic policies that optimize the population's net health benefit, a performance measure which accounts for both health and monetary outcomes. We further illustrate how dynamic policies can be defined and optimized for the control of a novel viral pathogen, where a policy maker must decide (i) when to employ or lift a transmission-reducing intervention (e.g. school closure) and (ii) how to prioritize population members for vaccination when a limited quantity of vaccines first become available. Within the context of this application, we demonstrate that dynamic policies can produce higher net health benefit than more commonly described static policies that specify a pre-determined sequence of interventions to employ throughout epidemics. Copyright (c) 2016 John Wiley & Sons, Ltd.
In many applications, decision making under uncertainty often involves two steps-prediction of a certain quality parameter or indicator of the system under study and the subsequent use of the prediction in choosing ac...
详细信息
In many applications, decision making under uncertainty often involves two steps-prediction of a certain quality parameter or indicator of the system under study and the subsequent use of the prediction in choosing actions. The prediction process is severely challenged by highly dynamic environments that particularly involve sequential decision making, such as air traffic control at airports in which congestion prediction is critical for smooth departure operations. Taxi-out time of a flight is an excellent indicator of surface congestion and is a quality parameter used in the assessment of airport delays. The regression, queueing, and moving average models have been shown to perform poorly in predicting taxi-out times because they are slow in adapting to the changing airport dynamics. This paper presents an approximate dynamic programming approach (reinforcement learning, RL) to taxi-out time prediction. The taxi-out prediction performance was tested on flight data obtained from the Federal Aviation Administration's (FAA) Aviation System Performance Metrics (ASPM) database on Detroit International (DTW), Washington Reagan National (DCA), Boston (BOS), New York John F. Kennedy (JFK) and Tampa International (TPA) airports. For example, at the Boston airport (presented in detail) the prediction accuracy by RI model was 14% higher than the queueing model and 39% higher than a running-average model. In general, the RL model was 35-50% more accurate than the regression model for all of the above airports. Copyright (c) 2010 John Wiley & Sons, Ltd.
This paper uses approximate linear programming (ALP) to compute average cost bounds for queueing network control problems. Like most approximate dynamic programming (ADP) methods, ALP approximates the differential cos...
详细信息
This paper uses approximate linear programming (ALP) to compute average cost bounds for queueing network control problems. Like most approximate dynamic programming (ADP) methods, ALP approximates the differential cost by a linear form. New types of approximating functions are identified that offer more accuracy than previous ALP studies or other performance bound methods. The structure of the infinite constraint set is exploited to reduce it to a more manageable set. When needed, constraint sampling and truncation methods are also developed. Numerical experiments show that the LPs using quadratic approximating functions can be easily solved on examples with up to 17 buffers. Using additional functions reduced the error to 1-5% at the cost of larger LPs. These ALPs were solved for systems with up to 6-11 buffers, depending on the functions used. The method computes bounds much faster than value iteration. It also gives some insights into policies. The ALPs do not scale to very large problems, but they offer more accurate bounds than other methods and the simplicity of just solving an LP. (C) 2015 Elsevier Ltd. All rights reserved.
This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is req...
详细信息
ISBN:
(纸本)9781509021550
This paper is concerned with optimal control problems of discrete-time nonlinear systems via a novel Q-learning algorithm. In the newly developed Q-learning algorithm, the iterative Q function in each iteration is required to update on the whole state and control spaces, instead of being updated by a single state and control pair. A new convergence criterion of the corresponding Q-learning algorithm is presented, where the traditional constraints for the learning rates of Q-learning algorithms is relaxed. Finally, simulation results are provided to exemplify the good performance of the developed algorithm.
In practice, optimal control problems of stochastic switching are notoriously challenging from a computational viewpoint, since typical real-world applications are high dimensional. In this approach, we suggest an alg...
详细信息
ISBN:
(纸本)9781467399418
In practice, optimal control problems of stochastic switching are notoriously challenging from a computational viewpoint, since typical real-world applications are high dimensional. In this approach, we suggest an algorithmic solution which is based on some convexity assumptions frequently fulfilled in applications. Furthermore, we show how the quality of numerical solution can be assessed. An efficient implementation of our algorithms is discussed.
In many applications, decision making under uncertainty often involves two steps-prediction of a certain quality parameter or indicator of the system under study and the subsequent use of the prediction in choosing ac...
详细信息
In many applications, decision making under uncertainty often involves two steps-prediction of a certain quality parameter or indicator of the system under study and the subsequent use of the prediction in choosing actions. The prediction process is severely challenged by highly dynamic environments that particularly involve sequential decision making, such as air traffic control at airports in which congestion prediction is critical for smooth departure operations. Taxi-out time of a flight is an excellent indicator of surface congestion and is a quality parameter used in the assessment of airport delays. The regression, queueing, and moving average models have been shown to perform poorly in predicting taxi-out times because they are slow in adapting to the changing airport dynamics. This paper presents an approximate dynamic programming approach (reinforcement learning, RL) to taxi-out time prediction. The taxi-out prediction performance was tested on flight data obtained from the Federal Aviation Administration's (FAA) Aviation System Performance Metrics (ASPM) database on Detroit International (DTW), Washington Reagan National (DCA), Boston (BOS), New York John F. Kennedy (JFK) and Tampa International (TPA) airports. For example, at the Boston airport (presented in detail) the prediction accuracy by RI model was 14% higher than the queueing model and 39% higher than a running-average model. In general, the RL model was 35-50% more accurate than the regression model for all of the above airports. Copyright (c) 2010 John Wiley & Sons, Ltd.
This paper is concerned with a discrete-time two-player zero-sum game of nonlinear systems, which is solved by a new iterative adaptive dynamicprogramming (ADP) method. In the present iterative ADP algorithm, two ite...
详细信息
ISBN:
(纸本)9783319406633;9783319406626
This paper is concerned with a discrete-time two-player zero-sum game of nonlinear systems, which is solved by a new iterative adaptive dynamicprogramming (ADP) method. In the present iterative ADP algorithm, two iteration procedures, which are upper and lower iterations, are implemented to obtain the upper and lower performance index functions, respectively. Initialized by an arbitrary positive semi-definite function, it is shown that the iterative value functions converge to the optimal performance index function if the optimal performance index function of the two-player zero-sum game exists. Finally, simulation results are given to illustrate the performance of the developed method.
An adaptive optimal control algorithm for system with uncertain dynamics is formulated under a Reinforcement Learning framework. An embedded exploratory component, is included explicitly in the objective function of a...
详细信息
An adaptive optimal control algorithm for system with uncertain dynamics is formulated under a Reinforcement Learning framework. An embedded exploratory component, is included explicitly in the objective function of an output feedback receding horizon Model Predictive Control problem. The optimization is formulated as a Quadratically Constrained Quadratic Program and it is solved to epsilon-global optimality. The iterative interaction between the action specified by the optimal solution and the approximation of cost functions balances the exploitation of current knowledge and the need for exploration. The proposed method is shown to converge to the optimal policy for a controllable discrete time linear plant with unknown output parameters. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
This paper presents an approach for recasting Markov Decision Process (MDP) problems into heuristics based planning problems. The basic idea is to use temporal decomposition of the state space based on a subset of sta...
详细信息
ISBN:
(纸本)9781467391276
This paper presents an approach for recasting Markov Decision Process (MDP) problems into heuristics based planning problems. The basic idea is to use temporal decomposition of the state space based on a subset of state space referred to as termination sample space. Specifically, the recasting of MDP problems is done in three steps. First step is to define a state space adaptation criterion based on the termination sample space. Second step is to define an action selection heuristic from each state. Third and final step is to define a recursion or backtracking methodology to avoid dead ends and infinite loops. All three steps have been described and discussed. A case study involving fault detection and alarm generation for the reaction wheels of a satellite mission has been discussed. The proposed approach has been compared with existing approaches for recasting MDP problems using the case study. Computational reduction achieved by the proposed approach is evident from the results.
In order to accept future high-yield booking requests, airlines protect seats from low-yield passengers. More seats may be reserved when passengers faced with closed fare classes can upsell to open higher fare classes...
详细信息
In order to accept future high-yield booking requests, airlines protect seats from low-yield passengers. More seats may be reserved when passengers faced with closed fare classes can upsell to open higher fare classes. We address the airline revenue management problem with capacity nesting and customer upsell, and formulate this problem by a stochastic optimization model to determine a set of static protection levels for each itinerary. We apply an approximate dynamic programming framework to approximate the objective function by piecewise linear functions, whose slopes (marginal revenue) are iteratively updated and returned by an efficient heuristic that simultaneous handles both nesting and upsells. The resulting allocation policy is tested over a real airline network and benchmarked against the randomized linear programming bid-price policy under various demand settings. Simulation results suggest that the proposed allocation policy significantly outperforms when incremental demand or upsell probability are high. Structural analyses are also provided for special demand dependence cases.
暂无评论