We study radiation therapy scheduling problem where dynamically and stochastically arriving patients of different types are scheduled to future days. Unlike similar models in the literature, we consider cancellation o...
详细信息
We study radiation therapy scheduling problem where dynamically and stochastically arriving patients of different types are scheduled to future days. Unlike similar models in the literature, we consider cancellation of treatments. We formulate this dynamic multi-appointment patient scheduling problem as a Markov Decision Process (MDP). Since the MDP is intractable due to large state and action spaces, we employ a simulation-based approximate dynamic programming (ADP) approach to approximately solve our model. In particular, we develop Least-square based approximate policy iteration for solving our model. The performance of the ADP approach is compared with that of a myopic heuristic decision rule.
Reinforcement learning (RL) can be used to obtain an approximate numerical solution to the Hamilton-Jacobi-Bellman (HJB) equation. Recent advances in machine learning community enable the use of deep neural networks (...
详细信息
Reinforcement learning (RL) can be used to obtain an approximate numerical solution to the Hamilton-Jacobi-Bellman (HJB) equation. Recent advances in machine learning community enable the use of deep neural networks (DNNs) to approximate high-dimensional nonlinear functions as those that occur in RL, accurately without any domain knowledge. In the standard RL setting, both system and cost structures are unknown, and the amount of data needed to obtain an accurate approximation can be impractically large. Meanwhile, when the structures are known, they can be used to solve the HJB equation efficiently. Herein, the model based globalized dual heuristic programming (GDHP) is proposed, in which the HJB equation is separated into value, costate, and policy functions. A particular class of interest in this research is finite horizon optimal tracking control (FHOC) problem. Additional issues that arise, such as time-varying functions, terminal constraints, and delta-input formulation, are addressed in the context of FHOC. The DNN structure and training algorithm suitable for FHOC are presented. A benchmark continuous reactor example is provided to illustrate the proposed approach. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
This paper explores the applicability of using dynamic programing (DP) and approximate dynamic programming (ADP) based methods for optimal dispatch of utility scale energy storage systems (ESS). In this study, the eff...
详细信息
ISBN:
(纸本)9781538635964
This paper explores the applicability of using dynamic programing (DP) and approximate dynamic programming (ADP) based methods for optimal dispatch of utility scale energy storage systems (ESS). In this study, the effectiveness of these approaches have been tested using the IEEE 13 node test feeder with distributed photovoltaics (PVs) and a utility scale storage system. In this work, a co-simulation based approach has been used to setup the experiment to be able to implement detailed ESS and network models. The results obtained from DP/ADP runs have been compared with three other control strategies both myopic and intelligent. Simulations results show that DP/ADP algorithms are a good candidate for optimal EES dispatch in terms of both solution quality and execution time.
The increased market penetration of renewable energy sources and the rapid development of electric battery storage technologies yield a potential for reducing electricity price volatility while maintaining stability o...
详细信息
The increased market penetration of renewable energy sources and the rapid development of electric battery storage technologies yield a potential for reducing electricity price volatility while maintaining stability of the power grid. This work presents an algorithmic approach to control battery levels and forward positions to optimally manage power output fluctuations caused by intermittent renewable energy generation. This paper will also explore the effect of battery technology on the firm's optimal trading behaviour in the electricity spot market. (C) 2017 Elsevier B.V. All rights reserved.
A commodity market participant trading via her inventory has access to both spot and forward markets. To liquidate her inventory, she can sell at the spot price, take a short forward position, or do a combination of b...
详细信息
A commodity market participant trading via her inventory has access to both spot and forward markets. To liquidate her inventory, she can sell at the spot price, take a short forward position, or do a combination of both. A trade is proposed in which there is always a hedging forward contract, which can be considered a dynamic cash and carry arbitrage. The trader can adjust the maturity of the forward contract dynamically until the inventory is depleted or a time constraint is reached. In the first setup, the storage contract (to carry inventory) is assumed to have a constant cost and a flexible duration. The risk and return characteristics of an approximate dynamic programming (ADP) and a Forward dynamic Optimization solution are compared. The trade is contrasted with optimal spot sale among other alternative liquidation strategies. Independent from the underlying stochastic forward price model, it is proved and verified numerically that a partial sale strategy is not optimal. The optimally selected forward maturities are limited to the subset comprising the immediate, next, and last timesteps. Under a more realistic storage contract, which assumes a stochastic cost and a fixed duration, a new ADP approach is developed. The optimal policy shows the tanker rent decision is accompanied by a buy order since the loss from an empty tanker is more than the gain of renting it cheaply yet early. Given the nonadjustable duration of the rent contract, a longer contract generates a higher value by benefiting from a tanker refill option.
The performance of two algorithms for finding traffic signal timings in a small symmetric network with oversaturated conditions was analyzed. The two algorithms include an approximate dynamic programming approach usin...
详细信息
ISBN:
(纸本)9781457721977
The performance of two algorithms for finding traffic signal timings in a small symmetric network with oversaturated conditions was analyzed. The two algorithms include an approximate dynamic programming approach using a "post-decision" state variable (ADP) and a simple genetic algorithm (GA). Results were found by using microscopic simulation and compared based on typical measures of performance (delay, throughput, number of stops) and also on measures that considered the efficiency of green time utilization and queue occupancy of the links. The symmetric characteristics of the small network allowed a straightforward analysis of the operation of the signals, providing some insights on the quality of the solutions. Results showed that even though the solutions from ADP were very different from those in GA, the network performance for both methods was similar, used green time efficiently preventing queue backups, and served all approaches according to current demands. The potential of ADP using the "post-decision" state variable is currently under further analysis using more challenging conditions, additional constraints, and domain knowledge as part of the algorithm formulation.
We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. Acco...
详细信息
We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamicprogramming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.
In this paper, a novel adaptive model-free attitude tracking control method is investigated for rigid spacecraft with consideration of the external disturbance, unknown inertia matrix, and input saturation. First, the...
详细信息
In this paper, a novel adaptive model-free attitude tracking control method is investigated for rigid spacecraft with consideration of the external disturbance, unknown inertia matrix, and input saturation. First, the considered attitude tracking system with input saturation is transformed into a Lagrangian model, and a dead zone-based model is used to describe the saturation nonlinearity. Second, using the prescribed performance control theory, a static prescribed performance attitude control scheme is presented, by which the transient and steady-state performance (including the convergence rate, overshoot, and boundedness) of the attitude tracking system is proved to be guaranteed. Third, in order to improve the performance of the static prescribed performance control scheme, a novel learning-based supplementary control scheme is presented based on the approximate dynamic programming. Finally, two groups of numerical simulations are used to illustrate the effectiveness of the proposed learning-based prescribed performance attitude control method.
In this paper, a novel optimal energy storage control scheme is investigated in smart grid environments with solar renewable energy. Based on the idea of adaptive dynamicprogramming (ADP), a self-learning algorithm i...
详细信息
In this paper, a novel optimal energy storage control scheme is investigated in smart grid environments with solar renewable energy. Based on the idea of adaptive dynamicprogramming (ADP), a self-learning algorithm is constructed to obtain the iterative control law sequence of the battery. Based on the data of the real-time electricity price (electricity rate in brief), the load demand (load in brief), and the solar renewable energy (solar energy in brief), the optimal performance index function, which minimizes the total electricity cost and simultaneously extends the battery's lifetime, is established. A new analysis method of the iterative ADP algorithm is developed to guarantee the convergence of the iterative value function to the optimum under iterative control law sequence for any time index in a period. Numerical results and comparisons are presented to illustrate the effectiveness of the developed algorithm.
We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space a...
详细信息
We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program's value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class. In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often out-performs state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments. (C) 2016 Elsevier Ltd. All rights reserved.
暂无评论