In this study, we use generalized policy iteration approximate dynamic programming (ADP) algorithm to design an optimal controller for a class of discrete-time systems with actuator saturation. A integral function is ...
详细信息
In this study, we use generalized policy iteration approximate dynamic programming (ADP) algorithm to design an optimal controller for a class of discrete-time systems with actuator saturation. A integral function is proposed to manage the saturation nonlinearity in actuators and then the generalized policy iteration ADP algorithm is developed to deal with the optimal control problem. Compared with other algorithm, the developed ADP algorithm includes 2 iteration procedures. In the present control scheme, 2 neural networks are introduced to approximate the control law and performance index function. Furthermore, numerical simulations illustrate the convergence and feasibility of the developed method.
Objective: dynamic ambulance redeployment policies tend to introduce much more flexibilities in improving ambulance resource allocation by capitalizing on the definite geospatial-temporal variations in ambulance deman...
详细信息
Objective: dynamic ambulance redeployment policies tend to introduce much more flexibilities in improving ambulance resource allocation by capitalizing on the definite geospatial-temporal variations in ambulance demand patterns over the time-of-the-day and day-of-the-week effects. A novel modelling framework based on the approximate dynamic programming (ADP) approach leveraging on a Discrete Events Simulation (DES) model for dynamic ambulance redeployment in Singapore is proposed in this paper. Methods: The study was based on the Singapore's national Emergency Medical Services (EMS) system. Based on a dataset comprising 216,973 valid incidents over a continuous two-years study period from 1 January 2011-31 December 2012, a DES model for the EMS system was developed. An ADP model based on linear value function approximations was then evaluated using the DES model via the temporal difference (TD) learning family of algorithms. The objective of the ADP model is to derive approximate optimal dynamic redeployment policies based on the primary outcome of ambulance coverage. Results: Considering an 8 min response time threshold, an estimated 5% reduction in the proportion of calls that cannot be reached within the threshold (equivalent to approximately 8000 dispatches) was observed from the computational experiments. The study also revealed that the redeployment policies which are restricted within the same operational division could potentially result in a more promising response time performance. Furthermore, the best policy involved the combination of redeploying ambulances whenever they are released from service and that of relocating ambulances that are idle in bases. Conclusion: This study demonstrated the successful application of an approximate modelling framework based on ADP that leverages upon a detailed DES model of the Singapore's EMS system to generate approximate optimal dynamic redeployment plans. Various policies and scenarios relevant to the Singapore EMS
This paper establishes an optimal control of unknown complex-valued system. Policy iteration is used to obtain the solution of the Hamilton-Jacobi-Bellman equation. Off-policy learning allows the iterative performance...
详细信息
This paper establishes an optimal control of unknown complex-valued system. Policy iteration is used to obtain the solution of the Hamilton-Jacobi-Bellman equation. Off-policy learning allows the iterative performance index and iterative control to be obtained by completely unknown dynamics. Critic and action networks are used to get the iterative control and iterative performance index, which execute policy evaluation and policy improvement. Asymptotic stability of the closed-loop system and the convergence of the iterative performance index function are proven. By Lyapunov technique, the uniformly ultimately bounded of the weight error is proven. Simulation study demonstrates the effectiveness of the proposed optimal control method.
In this paper, a novel local value iteration adaptive dynamicprogramming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper ...
详细信息
In this paper, a novel local value iteration adaptive dynamicprogramming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.
In this paper, we propose a heuristic for solving finite-horizon Markov decision processes. The heuristic uses the nested partitions (NP) framework to guide an iterative search for the optimal policy. NP focuses the s...
详细信息
In this paper, we propose a heuristic for solving finite-horizon Markov decision processes. The heuristic uses the nested partitions (NP) framework to guide an iterative search for the optimal policy. NP focuses the search on certain promising subregions, flexibly determined by the sampling weight of each action branch. Within each subregion, an effective local policy optimization is developed using sensitivity-based approach, which optimizes the sampling weights based on estimated gradient information. Numerical results show the effectiveness of the proposed heuristic. (C) 2017 Elsevier B.V. All rights reserved.
Forest management in the face of fire risk is a challenging problem because fire spreads across a landscape and because its occurrence is unpredictable. Accounting for the existence of stochastic events that generate ...
详细信息
Forest management in the face of fire risk is a challenging problem because fire spreads across a landscape and because its occurrence is unpredictable. Accounting for the existence of stochastic events that generate spatial interactions in the context of a dynamic decision process is crucial for determining optimal management. This paper demonstrates a method for incorporating spatial information and interactions into management decisions made over time. A machine learning technique called approximate dynamic programming is applied to determine the optimal timing and location of fuel treatments and timber harvests for a fire-threatened landscape. Larger net present values can be achieved using policies that explicitly consider evolving spatial interactions created by fire spread, compared to policies that ignore the spatial dimension of the inter-temporal optimization problem.
In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, ...
详细信息
In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
dynamic pricing for network revenue management has received considerable attention in research and practice. Based on data obtained from a major hotel, we use a large-scale numerical study to compare the performance o...
详细信息
dynamic pricing for network revenue management has received considerable attention in research and practice. Based on data obtained from a major hotel, we use a large-scale numerical study to compare the performance of several heuristic approaches proposed in the literature. The heuristic approaches we consider include deterministic linear programming with resolving and three variants of dynamicprogramming decomposition. dynamicprogramming decomposition is considered one of the strongest heuristics and is the method chosen in some recent commercial implementations, and remains a topic of research in the recent academic literature. In addition to a plain-vanilla implementation of dynamicprogramming decomposition, we consider two variants proposed in recent literature. For the base scenario generated from the real data, we show that the method based on Zhang (2011) [An improved dynamicprogramming decomposition approach for network revenue management. Manufacturing Service Oper. Management 13(1): 35-52.] leads to a small but significant lift in revenue compared with all other approaches. We generate many alternative problem scenarios by varying capacity-demand ratio and network structure and show that the performance of the different heuristics can be strongly influenced by both. Overall, our paper shows the promise of some recent proposals in the academic literature but also offers a cautionary tale on the choice of heuristic methods for practical network pricing problems.
We study non-preemptive scheduling problems where heterogeneous projects stochastically arrive over time. The projects include precedence-constrained tasks that require multiple resources. Incomplete projects are held...
详细信息
We study non-preemptive scheduling problems where heterogeneous projects stochastically arrive over time. The projects include precedence-constrained tasks that require multiple resources. Incomplete projects are held in queues. When a queue is full, an arriving project must be rejected. The goal is to choose which tasks to start in each time-slot to maximize the infinite-horizon discounted expected profit. We provide a weakly coupled Markov decision process (MDP) formulation and apply a simulation-based approximate policy iteration method. Extensive numerical results are presented. (C) 2017 Elsevier B.V. All rights reserved.
dynamicprogramming (DP) is a mathematical programming approach for optimizing a system that changes over time and is a common approach for developing intelligent systems. Expert systems that are intelligent must be a...
详细信息
dynamicprogramming (DP) is a mathematical programming approach for optimizing a system that changes over time and is a common approach for developing intelligent systems. Expert systems that are intelligent must be able to adapt dynamically over time. An optimal DP policy identifies the optimal decision dependent on the current state of the system. Hence, the decisions controlling the system can intelligently adapt to changing system states. Although DP has existed since Bellman introduced it in 1957, exact DP policies are only possible for problems with low dimension or under very limiting restrictions. Fortunately, advances in computational power have given rise to approximate DP (ADP). However, most ADP algorithms are still computationally-intractable for high-dimensional problems. This paper specifically considers continuous-state DP problems in which the state variables are multicollinear. The issue of multicollinearity is currently ignored in the ADP literature, but in the statistics community it is well known that high multicollinearity leads to unstable (high variance) parameter estimates in statistical modeling. While not all real world DP applications involve high multicollinearity, it is not uncommon for real cases to involve observed state variables that are correlated, such as the air quality ozone pollution application studied in this research. Correlation is a common occurrence in observed data, including sources in meteorology, energy, finance, manufacturing, health care, etc. ADP algorithms for continuous-state DP achieve an approximate solution through discretization of the state space and model approximations. Typical state space discretizations involve full-dimensional grids or random sampling. The former option requires exponential growth in the number of state points as the state space dimension grows, while the latter option is typically inefficient and requires an intractable number of state points. The exception is computationally-tractable
暂无评论