检索结果-内蒙古大学图书馆

Simulation-based approximate policy iteration for dynamic patient scheduling for radiation therapy

HEALTH CARE MANAGEMENT SCIENCE 2018年第3期21卷 317-325页

作者： Gocgun, Yasin Istanbul Kemerburgaz Univ Dept Ind Engn Istanbul Turkey

We study radiation therapy scheduling problem where dynamically and stochastically arriving patients of different types are scheduled to future days. Unlike similar models in the literature, we consider cancellation of treatments. We formulate this dynamic multi-appointment patient scheduling problem as a Markov Decision Process (MDP). Since the MDP is intractable due to large state and action spaces, we employ a simulation-based approximate dynamic programming (ADP) approach to approximately solve our model. In particular, we develop Least-square based approximate policy iteration for solving our model. The performance of the ADP approach is compared with that of a myopic heuristic decision rule.

关键词： Patient scheduling Markov decision processes approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning based finite-horizon optimal tracking control for nonlinear system

Deep reinforcement learning based finite-horizon optimal tra...

引用

Joint Meeting of the 2nd IFAC Workshop on Linear Parameter Varying Systems (LPVS) / 9th IFAC Symposium on Robust Control Design (ROCOND)

作者： Kim, Jong Woo Park, Byung Jun Yoo, Haeun Lee, Jay H. Lee, Jong Min Seoul Natl Univ Inst Chem Proc Sch Chem & Biol Engn 1 Gwanak Ro Seoul 08826 South Korea Korea Adv Inst Sci & Technol Chem & Biomol Engn Dept Daejeon 3041 South Korea

Reinforcement learning (RL) can be used to obtain an approximate numerical solution to the Hamilton-Jacobi-Bellman (HJB) equation. Recent advances in machine learning community enable the use of deep neural networks (DNNs) to approximate high-dimensional nonlinear functions as those that occur in RL, accurately without any domain knowledge. In the standard RL setting, both system and cost structures are unknown, and the amount of data needed to obtain an accurate approximation can be impractically large. Meanwhile, when the structures are known, they can be used to solve the HJB equation efficiently. Herein, the model based globalized dual heuristic programming (GDHP) is proposed, in which the HJB equation is separated into value, costate, and policy functions. A particular class of interest in this research is finite horizon optimal tracking control (FHOC) problem. Additional issues that arise, such as time-varying functions, terminal constraints, and delta-input formulation, are addressed in the context of FHOC. The DNN structure and training algorithm suitable for FHOC are presented. A benchmark continuous reactor example is provided to illustrate the proposed approach. (C) 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.

关键词： Reinforcement learning approximate dynamic programming Deep learning Globalized dual heuristic programming Optimal control Optimal tracking

来源：评论

学校读者我要写书评

暂无评论

Optimizing Storage Operation for a Probabilistic Locational Marginal Pricing Forecast

Optimizing Storage Operation for a Probabilistic Locational ...

引用

IEEE International Conference on Probabilistic Methods Applied to Power Systems (PMAPS)

作者： Latif, Aadil Krishnamurthy, Dheepak Palmintier, Brian Natl Renewable Energy Lab Golden CO 80401 USA

ISBN: (纸本)9781538635964

This paper explores the applicability of using dynamic programing (DP) and approximate dynamic programming (ADP) based methods for optimal dispatch of utility scale energy storage systems (ESS). In this study, the effectiveness of these approaches have been tested using the IEEE 13 node test feeder with distributed photovoltaics (PVs) and a utility scale storage system. In this work, a co-simulation based approach has been used to setup the experiment to be able to implement detailed ESS and network models. The results obtained from DP/ADP runs have been compared with three other control strategies both myopic and intelligent. Simulations results show that DP/ADP algorithms are a good candidate for optimal EES dispatch in terms of both solution quality and execution time.

关键词： energy storage dynamic programming approximate dynamic programming co-simulation

来源：评论

学校读者我要写书评

暂无评论

Optimal forward trading and battery control under renewable electricity generation

引用

JOURNAL OF BANKING & FINANCE 2018年 95卷 244-254页

作者： Hinz, Juri Yee, Jeremy Univ Technol Sydney Sch Math & Phys Sci Sydney NSW Australia CSIRO Canberra ACT Australia

The increased market penetration of renewable energy sources and the rapid development of electric battery storage technologies yield a potential for reducing electricity price volatility while maintaining stability of the power grid. This work presents an algorithmic approach to control battery levels and forward positions to optimally manage power output fluctuations caused by intermittent renewable energy generation. This paper will also explore the effect of battery technology on the firm's optimal trading behaviour in the electricity spot market. (C) 2017 Elsevier B.V. All rights reserved.

关键词： approximate dynamic programming Battery control Distributed energy systems Energy storage Forward contracts Real options

来源：评论

学校读者我要写书评

暂无评论

Optimal Trading of a Storable Commodity via Forward Markets

Optimal Trading of a Storable Commodity via Forward Markets

引用

作者： Behzad Ghafouri University of Western Ontario

学位级别：博士

A commodity market participant trading via her inventory has access to both spot and forward markets. To liquidate her inventory, she can sell at the spot price, take a short forward position, or do a combination of both. A trade is proposed in which there is always a hedging forward contract, which can be considered a dynamic cash and carry arbitrage. The trader can adjust the maturity of the forward contract dynamically until the inventory is depleted or a time constraint is reached. In the first setup, the storage contract (to carry inventory) is assumed to have a constant cost and a flexible duration. The risk and return characteristics of an approximate dynamic programming (ADP) and a Forward dynamic Optimization solution are compared. The trade is contrasted with optimal spot sale among other alternative liquidation strategies. Independent from the underlying stochastic forward price model, it is proved and verified numerically that a partial sale strategy is not optimal. The optimally selected forward maturities are limited to the subset comprising the immediate, next, and last timesteps. Under a more realistic storage contract, which assumes a stochastic cost and a fixed duration, a new ADP approach is developed. The optimal policy shows the tanker rent decision is accompanied by a buy order since the loss from an empty tanker is more than the gain of renting it cheaply yet early. Given the nonadjustable duration of the rent contract, a longer contract generates a higher value by benefiting from a tanker refill option.

关键词： Real Options Cash and Carry Arbitrage Oil Storage Forward Trading Markov Decision Process approximate dynamic programming Least Squares Monte Carlo

来源：评论

学校读者我要写书评

暂无评论

A Comparison of approximate dynamic programming and Simple Genetic Algorithm for Traffic Control in Oversaturated Conditions - Case study of a Simple Symmetric Network

A Comparison of Approximate Dynamic Programming and Simple G...

引用

14th International IEEE Conference on Intelligent Transportation Systems (ITSC)

作者： Medina, Juan C. Hajbabaie, Ali Benekohal, Rahim F. Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9781457721977

The performance of two algorithms for finding traffic signal timings in a small symmetric network with oversaturated conditions was analyzed. The two algorithms include an approximate dynamic programming approach using a "post-decision" state variable (ADP) and a simple genetic algorithm (GA). Results were found by using microscopic simulation and compared based on typical measures of performance (delay, throughput, number of stops) and also on measures that considered the efficiency of green time utilization and queue occupancy of the links. The symmetric characteristics of the small network allowed a straightforward analysis of the operation of the signals, providing some insights on the quality of the solutions. Results showed that even though the solutions from ADP were very different from those in GA, the network performance for both methods was similar, used green time efficiently preventing queue backups, and served all approaches according to current demands. The potential of ADP using the "post-decision" state variable is currently under further analysis using more challenging conditions, additional constraints, and domain knowledge as part of the algorithm formulation.

关键词： Delay Equations Genetic algorithms Mathematical model Traffic control Vehicles approximate dynamic programming dynamic programming genetic algorithms green time utilization microscopic simulation oversaturated conditions post-decision state variable queue

来源：评论

学校读者我要写书评

暂无评论

Chaotic system optimal tracking using data-based synchronous method with unknown dynamics and disturbances

引用

Chinese Physics B 2017年第3期26卷 268-275页

作者：宋睿卓魏庆来 School of Automation and Electrical Engineering University of Science and Technology Beijing The State Key Laboratory of Management and Control for Complex Systems Institute of AutomationChinese Academy of Sciences

We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration （PI） is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamic programming （ADP） algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs （HJI） equation online only using measured data and without any knowledge about the system dynamics. Critic neural network （CNN）, action neural network （ANN）, and disturbance neural network （DNN） are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded （UUB） of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.

关键词： adaptive dynamic programming approximate dynamic programming chaotic system zero-sum

来源：评论

学校读者我要写书评

暂无评论

Novel Adaptive Saturated Attitude Tracking Control of Rigid Spacecraft with Guaranteed Transient and Steady-State Performance

引用

JOURNAL OF AEROSPACE ENGINEERING 2018年第5期31卷 04018062-04018062页

作者： Yin, Zeyang Luo, Jianjun Wei, Caisheng Northwestern Polytech Univ Sch Astronaut Natl Key Lab Aerosp Flight Dynam Youyi West St Xian 710072 Shaanxi Peoples R China

In this paper, a novel adaptive model-free attitude tracking control method is investigated for rigid spacecraft with consideration of the external disturbance, unknown inertia matrix, and input saturation. First, the considered attitude tracking system with input saturation is transformed into a Lagrangian model, and a dead zone-based model is used to describe the saturation nonlinearity. Second, using the prescribed performance control theory, a static prescribed performance attitude control scheme is presented, by which the transient and steady-state performance (including the convergence rate, overshoot, and boundedness) of the attitude tracking system is proved to be guaranteed. Third, in order to improve the performance of the static prescribed performance control scheme, a novel learning-based supplementary control scheme is presented based on the approximate dynamic programming. Finally, two groups of numerical simulations are used to illustrate the effectiveness of the proposed learning-based prescribed performance attitude control method.

关键词： Prescribed performance Spacecraft approximate dynamic programming Neural networks Input saturation

来源：评论

学校读者我要写书评

暂无评论

Adaptive dynamic programming-Based Optimal Control Scheme for Energy Storage Systems With Solar Renewable Energy

引用

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2017年第7期64卷 5468-5478页

作者： Wei, Qinglai Shi, Guang Song, Ruizhuo Liu, Yu Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Chinese Acad Sci Inst Automat Beijing 100190 Peoples R China

In this paper, a novel optimal energy storage control scheme is investigated in smart grid environments with solar renewable energy. Based on the idea of adaptive dynamic programming (ADP), a self-learning algorithm is constructed to obtain the iterative control law sequence of the battery. Based on the data of the real-time electricity price (electricity rate in brief), the load demand (load in brief), and the solar renewable energy (solar energy in brief), the optimal performance index function, which minimizes the total electricity cost and simultaneously extends the battery's lifetime, is established. A new analysis method of the iterative ADP algorithm is developed to guarantee the convergence of the iterative value function to the optimum under iterative control law sequence for any time index in a period. Numerical results and comparisons are presented to illustrate the effectiveness of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming energy storage energy storage system optimal control solar renewable energy

来源：评论

学校读者我要写书评

暂无评论

Least squares approximate policy iteration for learning bid prices in choice-based revenue management

引用

COMPUTERS & OPERATIONS RESEARCH 2017年第0期77卷 240-253页

作者： Koch, Sebastian Univ Augsburg Chair Analyt & Optimizat Univ Str 16 D-86159 Augsburg Germany

We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program's value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class. In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often out-performs state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： Revenue management Capacity control approximate dynamic programming approximate policy iteration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：