In this paper, a synchronous solution method for multi-player zero-sum games without system dynamics is established based on neural network. The policy iteration (PI) algorithm is presented to solve the Hamilton-Jacob...
详细信息
In this paper, a synchronous solution method for multi-player zero-sum games without system dynamics is established based on neural network. The policy iteration (PI) algorithm is presented to solve the Hamilton-Jacobi-Bellman (HJB) equation. It is proven that the obtained iterative cost function is convergent to the optimal game value. For avoiding system dynamics, off-policy learning method is given to obtain the iterative cost function, controls and disturbances based on Pl. Critic neural network (CNN), action neural networks (ANNs) and disturbance neural networks (DNNs) are used to approximate the cost function, controls and disturbances. The weights of neural networks compose the synchronous weight matrix, and the uniformly ultimately bounded (UUB) of the synchronous weight matrix is proven. Two examples are given to show that the effectiveness of the proposed synchronous solution method for multi-player ZS games. (C) 2017 Elsevier B.V. All rights reserved.
The paper treats a class of optimal control problems for deterministic nonlinear discrete-time systems with the objective of maximizing the time or total yield until prescribed constraints are violated. Such problems ...
详细信息
The paper treats a class of optimal control problems for deterministic nonlinear discrete-time systems with the objective of maximizing the time or total yield until prescribed constraints are violated. Such problems are referred to as drift counteraction optimal control (DCOC) problems as the corresponding control policy may be viewed as optimally counteracting drift imposed by disturbances or system dynamics. We derive conditions for the existence of an optimal solution. The optimal control policy is characterized by the value function and a new algorithm based on proportional feedback is presented that obtains the value function faster than conventional dynamicprogramming algorithms. In addition, an approximate dynamic programming (ADP) approach using Gaussian process regression is formulated based on the new algorithm. Two numerical examples are reported, a time maximization problem for a van der Pol oscillator and a satellite life extension problem. (C) 2017 Elsevier Ltd. All rights reserved.
We consider a system where inelastic demand for electric power is met from three sources: 1) the grid;2) in-house renewables such as solar panels;and 3) an in-house energy storage device. In our setting, energy demand...
详细信息
We consider a system where inelastic demand for electric power is met from three sources: 1) the grid;2) in-house renewables such as solar panels;and 3) an in-house energy storage device. In our setting, energy demand, renewable power supply, and cost for grid power are all time-varying and stochastic. Furthermore, there are limits and inefficiency associated with charging and discharging the energy storage device. We formulate the storage operation problem as a dynamic program with parameters estimated from real-world demand, supply, and cost data. As the dynamic program is computationally intensive for large-scale problems, we explore algorithms based on approximate dynamic programming (ADP) and apply them to a test data set. Using the real-world test data, we numerically compare the performance of two ADP-based algorithms against Lyapunov optimization-based algorithms that require no statistical knowledge. Our results ascertain the value of storage and the value of installing a renewable source.
Features of the data-driven approximate value iteration (AVI) algorithm, proposed in Li et al. (2014) for dealing with the optimal stabilization problem, include that only process data is required and that the estimat...
详细信息
Features of the data-driven approximate value iteration (AVI) algorithm, proposed in Li et al. (2014) for dealing with the optimal stabilization problem, include that only process data is required and that the estimate of the domain of attraction for the closed-loop is enlarged. However, the controller generated by the data-driven AVI algorithm is an approximate solution for the optimal control problem. In this work, a quantitative analysis result on the error bound between the optimal cost and the cost under the designed controller is given. This error bound is determined by the approximation error of the estimation for the optimal cost and the approximation error of the controller function estimator. The first one is concretely determined by the approximation error of the data-driven dynamicprogramming (DP) operator to the DP operator and the approximation error of the value function estimator. These three approximation errors are zeros when the data set of the plant is sufficient and infinitely complete, and the number of samples in the interested state space is infinite. This means that the cost under the designed controller equals to the optimal cost when the number of iterations is infinite. (C) 2016 Elsevier Ltd. All rights reserved.
We study patient admission policies in a neurology ward where there are multiple types of patients with different medical characteristics. Patients receive specialized care inside the neurology ward and delays in admi...
详细信息
We study patient admission policies in a neurology ward where there are multiple types of patients with different medical characteristics. Patients receive specialized care inside the neurology ward and delays in admission to the ward will have negative impact on their health status. The level of this impact varies among patient types and depends on the severity of patients. Patients are also different in terms of arrival rate and length of stay at the ward. The patients normally wait in the emergency department until a ward bed is assigned to them. We formulate this problem as an infinite-horizon average cost dynamic program and propose an efficient approximation scheme to solve large-scale problem instances. The computational results from applying our model to a neurology ward show that dynamic policies generated by our approach can reduce the overall deterioration in patients' health status compared to several alternative policies.
In this paper, a novel mixed iterative adaptive dynamicprogramming (ADP) algorithm is developed to solve the optimal battery energy management and control problem in smart residential microgrid systems. Based on the ...
详细信息
In this paper, a novel mixed iterative adaptive dynamicprogramming (ADP) algorithm is developed to solve the optimal battery energy management and control problem in smart residential microgrid systems. Based on the data of the load and electricity rate, two iterations are constructed, which are P-iteration and V-iteration, respectively. The V-iteration is implemented based on value iteration, which aims to obtain the iterative control law sequence in each period. The P-iteration is implemented based on policy iteration, which updates the iterative value function according to the iterative control law sequence. Properties of the developed mixed iterative ADP algorithm are analyzed. It is shown that the iterative value function is monotonically nonincreasing and converges to the solution of the Bellman equation. In each iteration, it is proven that the performance index function is finite under the iterative control law sequence. Finally, numerical results and comparisons are given to illustrate the performance of the developed algorithm.
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamicprogramming algorithm. In the discrete-time local policy iteration algorithm, the iterative value ...
详细信息
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamicprogramming algorithm. In the discrete-time local policy iteration algorithm, the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.
Despite rapid advances of information technologies for intelligent parking systems, it remains a challenge to optimally manage limited parking resources in busy urban neighborhoods. In this paper, we use dynamic locat...
详细信息
Despite rapid advances of information technologies for intelligent parking systems, it remains a challenge to optimally manage limited parking resources in busy urban neighborhoods. In this paper, we use dynamic location-dependent parking pricing and reservation to improve system-wide performance of an intelligent parking system. With this system, the parking agency is able to decide the spatial and temporal distribution of parking prices to achieve a variety of objectives, while drivers with different origins and destinations compete for limited parking spaces via online reservation. We develop a multi period non-cooperative bi-level model to capture the complex interactions among the parking agency and multiple drivers, as well as a non-myopic approximate dynamic programming (ADP) approach to solve the model. It is shown with numerical examples that the ADP-based pricing policy consistently outperforms alternative policies in achieving greater performance of the parking system, and shows reliability in handling the spatial and temporal variations in parking demand. (C) 2017 Elsevier Ltd. All rights reserved.
We investigate the scheduling practices of a multidisciplinary, multistage, outpatient health care program. Patients undergo a series of assessments before being eligible for elective surgery. Such systems often suffe...
详细信息
We investigate the scheduling practices of a multidisciplinary, multistage, outpatient health care program. Patients undergo a series of assessments before being eligible for elective surgery. Such systems often suffer from high rates of attrition and appointment no-shows leading to capacity underutilization and treatment delays. We propose a new scheduling model where the clinic assigns patients to an appointment day but postpones the decision of which assessments patients undergo pending the observation of who arrives. In doing so, the clinic gains flexibility to improve system performance. We formulate the scheduling problem as a Markov decision process and use approximate dynamic programming to solve it. We apply our approach to a dataset collected from a bariatric surgery program at a large tertiary hospital in Toronto, Canada. We examine the quality of our solutions via structural results and compare them with heuristic scheduling practices using a discrete-event simulation. By allowing multiple assessments, delaying their scheduling, and by optimizing over an appointment book, we show significant improvements in patient throughput, clinic profit, use of overtime, and staff utilization.
Least squares Monte Carlo (LSMC) approaches represent a computationally efficient method for the valuation of natural gas storage facilities. LSMC methods are computationally tractable while they simultaneously allow ...
详细信息
Least squares Monte Carlo (LSMC) approaches represent a computationally efficient method for the valuation of natural gas storage facilities. LSMC methods are computationally tractable while they simultaneously allow for a decoupling of tlie price path simulation from the optimization of the decision vector. However, selecting the appropriate features using traditional regression techniques can be challenging, particularly when several factors of uncertainty are assumed to drive the price process. In this paper we analyze a natural gas storage contract using a two factor forward model whose parameters can be easily calibrated. For a forward curve derived from monthly averages of the NBP day-ahead contract from 2004 to 2009 we compute storage values based on a collection of spot price paths and price paths of a daily forward contract with a time to maturity of 30 days. We study the impact of additional pricing information in the form of a forward contract on the value of a gas storage facility. A comparison to the corresponding one factor model is also included in our experiments. Value function approximation is carried out by employing a kernel-based regression technique in the form of support vector machine regression (SVR). We report out-of-sample results by simulating the targets for the next stage. We also carry out a search in the space of SVR parameters to identify the appropriate parameters for our experiments. Applying a spot trading strategy we observe a higher storage value for the one factor model when compared to the corresponding two factor model. With respect to the two factor model we report that an approximation of the value function over both a spot and a forward contract increases storage value compared to a value function that is computed over a spot contract only.
暂无评论