检索结果-内蒙古大学图书馆

Optimal control for discrete-time systems with actuator saturation

OPTIMAL CONTROL APPLICATIONS & METHODS 2017年第6期38卷 1071-1080页

作者： Lin, Qiao Wei, Qinglai Zhao, Bo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this study, we use generalized policy iteration approximate dynamic programming (ADP) algorithm to design an optimal controller for a class of discrete-time systems with actuator saturation. A integral function is proposed to manage the saturation nonlinearity in actuators and then the generalized policy iteration ADP algorithm is developed to deal with the optimal control problem. Compared with other algorithm, the developed ADP algorithm includes 2 iteration procedures. In the present control scheme, 2 neural networks are introduced to approximate the control law and performance index function. Furthermore, numerical simulations illustrate the convergence and feasibility of the developed method.

关键词： approximate dynamic programming discrete time generalized policy iteration optimal control saturating actuators

来源：评论

学校读者我要写书评

暂无评论

Simulation-based decision support framework for dynamic ambulance redeployment in Singapore

引用

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS 2017年 106卷 37-47页

作者： Lam, Sean Shao Wei Ng, Clarence Boon Liang Nguyen, Francis Ngoc Hoang Long Ng, Yih Yng Ong, Marcus Eng Hock Singapore Hlth Serv Hlth Serv Res Ctr 20 Coll RdAcad Discovery TowerLevel 6 Singapore S169856 Singapore Duke NUS Grad Med Sch Hlth Serv & Syst Res Singapore Singapore Natl Univ Singapore Dept Ind & Syst Engn Singapore Singapore Singapore Civil Def Force Med Dept Singapore Singapore Singapore Gen Hosp Dept Emergency Med Singapore Singapore

Objective: dynamic ambulance redeployment policies tend to introduce much more flexibilities in improving ambulance resource allocation by capitalizing on the definite geospatial-temporal variations in ambulance demand patterns over the time-of-the-day and day-of-the-week effects. A novel modelling framework based on the approximate dynamic programming (ADP) approach leveraging on a Discrete Events Simulation (DES) model for dynamic ambulance redeployment in Singapore is proposed in this paper. Methods: The study was based on the Singapore's national Emergency Medical Services (EMS) system. Based on a dataset comprising 216,973 valid incidents over a continuous two-years study period from 1 January 2011-31 December 2012, a DES model for the EMS system was developed. An ADP model based on linear value function approximations was then evaluated using the DES model via the temporal difference (TD) learning family of algorithms. The objective of the ADP model is to derive approximate optimal dynamic redeployment policies based on the primary outcome of ambulance coverage. Results: Considering an 8 min response time threshold, an estimated 5% reduction in the proportion of calls that cannot be reached within the threshold (equivalent to approximately 8000 dispatches) was observed from the computational experiments. The study also revealed that the redeployment policies which are restricted within the same operational division could potentially result in a more promising response time performance. Furthermore, the best policy involved the combination of redeploying ambulances whenever they are released from service and that of relocating ambulances that are idle in bases. Conclusion: This study demonstrated the successful application of an approximate modelling framework based on ADP that leverages upon a detailed DES model of the Singapore's EMS system to generate approximate optimal dynamic redeployment plans. Various policies and scenarios relevant to the Singapore EMS

关键词： Emergency medical services Ambulance deployment approximate dynamic programming dynamic redeployment policies Discrete events simulation

来源：评论

学校读者我要写书评

暂无评论

Off-policy neuro-optimal control for unknown complex-valued nonlinear systems based on policy iteration

引用

NEURAL COMPUTING & APPLICATIONS 2017年第6期28卷 1435-1441页

作者： Song, Ruizhuo Wei, Qinglai Xiao, Wendong Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper establishes an optimal control of unknown complex-valued system. Policy iteration is used to obtain the solution of the Hamilton-Jacobi-Bellman equation. Off-policy learning allows the iterative performance index and iterative control to be obtained by completely unknown dynamics. Critic and action networks are used to get the iterative control and iterative performance index, which execute policy evaluation and policy improvement. Asymptotic stability of the closed-loop system and the convergence of the iterative performance index function are proven. By Lyapunov technique, the uniformly ultimately bounded of the weight error is proven. Simulation study demonstrates the effectiveness of the proposed optimal control method.

关键词： Adaptive dynamic programming approximate dynamic programming Adaptive critic designs Optimal control

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Local Value Iteration Adaptive dynamic programming: Admissibility and Termination Analysis

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017年第11期28卷 2490-2502页

作者： Wei, Qinglai Liu, Derong Lin, Qiao Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a novel local value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The focuses of this paper are to study admissibility properties and the termination criteria of discrete-time local value iteration ADP algorithms. In the discrete-time local value iteration ADP algorithm, the iterative value functions and the iterative control laws are both updated in a given subset of the state space in each iteration, instead of the whole state space. For the first time, admissibility properties of iterative control laws are analyzed for the local value iteration ADP algorithm. New termination criteria are established, which terminate the iterative local ADP algorithm with an admissible approximate optimal control law. Finally, simulation results are given to illustrate the performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming local iteration neural networks neurodynamic programming nonlinear systems optimal control

来源：评论

学校读者我要写书评

暂无评论

Sensitivity-based nested partitions for solving finite-horizon Markov decision processes

引用

OPERATIONS RESEARCH LETTERS 2017年第5期45卷 481-487页

作者： Chen, Weiwei Rutgers State Univ Dept Supply Chain Management 1 Washington Pk Newark NJ 07102 USA

In this paper, we propose a heuristic for solving finite-horizon Markov decision processes. The heuristic uses the nested partitions (NP) framework to guide an iterative search for the optimal policy. NP focuses the search on certain promising subregions, flexibly determined by the sampling weight of each action branch. Within each subregion, an effective local policy optimization is developed using sensitivity-based approach, which optimizes the sampling weights based on estimated gradient information. Numerical results show the effectiveness of the proposed heuristic. (C) 2017 Elsevier B.V. All rights reserved.

关键词： approximate dynamic programming Markov decision processes Nested partitions Sensitivity-based approach

来源：评论

学校读者我要写书评

暂无评论

Spatial interactions and optimal forest management on a fire-threatened landscape

引用

FOREST POLICY AND ECONOMICS 2017年 83卷 107-120页

作者： Lauer, Christopher J. Montgomery, Claire A. Dietterich, Thomas G. Oregon State Univ Appl Econ 213 Ballard Extens Hall Corvallis OR 97331 USA Oregon State Univ Forest Engn Resources &Management Corvallis OR 97331 USA Oregon State Univ Elect Engn & Comp Sci 2067 Kelly Engn Ctr Corvallis OR 97331 USA

Forest management in the face of fire risk is a challenging problem because fire spreads across a landscape and because its occurrence is unpredictable. Accounting for the existence of stochastic events that generate spatial interactions in the context of a dynamic decision process is crucial for determining optimal management. This paper demonstrates a method for incorporating spatial information and interactions into management decisions made over time. A machine learning technique called approximate dynamic programming is applied to determine the optimal timing and location of fuel treatments and timber harvests for a fire-threatened landscape. Larger net present values can be achieved using policies that explicitly consider evolving spatial interactions created by fire spread, compared to policies that ignore the spatial dimension of the inter-temporal optimization problem.

关键词： Wildland fire Spatial Ecological disturbance Risk approximate dynamic programming Reinforcement learning Forestry

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

引用

IEEE TRANSACTIONS ON CYBERNETICS 2017年第5期47卷 1224-1237页

作者： Wei, Qinglai Lewis, Frank L. Sun, Qiuye Yan, Pengfei Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Texas Arlington UTA Res Inst Arlington TX 76118 USA Northeastern Univ Shenyang 110036 Peoples R China Northeastern Univ Sch Informat Sci & Engn Shenyang 110036 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming neural networks (NNs) neuro-dynamic programming optimal control Q-learning

来源：评论

学校读者我要写书评

暂无评论

dynamic Pricing for Network Revenue Management: A New Approach and Application in the Hotel Industry

引用

INFORMS JOURNAL ON COMPUTING 2017年第1期29卷 18-35页

作者： Zhang, Dan Weatherford, Larry Univ Colorado Leeds Sch Business Boulder CO 80309 USA Univ Wyoming Coll Business Laramie WY 82071 USA

dynamic pricing for network revenue management has received considerable attention in research and practice. Based on data obtained from a major hotel, we use a large-scale numerical study to compare the performance of several heuristic approaches proposed in the literature. The heuristic approaches we consider include deterministic linear programming with resolving and three variants of dynamic programming decomposition. dynamic programming decomposition is considered one of the strongest heuristics and is the method chosen in some recent commercial implementations, and remains a topic of research in the recent academic literature. In addition to a plain-vanilla implementation of dynamic programming decomposition, we consider two variants proposed in recent literature. For the base scenario generated from the real data, we show that the method based on Zhang (2011) [An improved dynamic programming decomposition approach for network revenue management. Manufacturing Service Oper. Management 13(1): 35-52.] leads to a small but significant lift in revenue compared with all other approaches. We generate many alternative problem scenarios by varying capacity-demand ratio and network structure and show that the performance of the different heuristics can be strongly influenced by both. Overall, our paper shows the promise of some recent proposals in the academic literature but also offers a cautionary tale on the choice of heuristic methods for practical network pricing problems.

关键词： revenue management dynamic pricing approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

approximate policy iteration for dynamic resource-constrained project scheduling

引用

OPERATIONS RESEARCH LETTERS 2017年第5期45卷 442-447页

作者： Parizi, Mahshid Salemi Gocgun, Yasin Ghate, Archis Univ Washington Ind & Syst Engn BOX 352650 Seattle WA 98195 USA Altinbas Univ Dept Ind Engn Istanbul Turkey

We study non-preemptive scheduling problems where heterogeneous projects stochastically arrive over time. The projects include precedence-constrained tasks that require multiple resources. Incomplete projects are held in queues. When a queue is full, an arriving project must be rejected. The goal is to choose which tasks to start in each time-slot to maximize the infinite-horizon discounted expected profit. We provide a weakly coupled Markov decision process (MDP) formulation and apply a simulation-based approximate policy iteration method. Extensive numerical results are presented. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Markov decision processes approximate dynamic programming Queueing

来源：评论

学校读者我要写书评

暂无评论

Data mining for state space orthogonalization in adaptive dynamic programming

引用

EXPERT SYSTEMS WITH APPLICATIONS 2017年 76卷 49-58页

作者： Ariyajunya, Bancha Chen, Ying Chen, Victoria C. P. Kim, Seoung Bum Burapha Univ Fac Engn Chon Buri Thailand Univ Texas Arlington Dept Ind Mfg & Syst Engn Arlington TX 76019 USA Korea Univ Sch Ind Management Engn Seoul South Korea

dynamic programming (DP) is a mathematical programming approach for optimizing a system that changes over time and is a common approach for developing intelligent systems. Expert systems that are intelligent must be able to adapt dynamically over time. An optimal DP policy identifies the optimal decision dependent on the current state of the system. Hence, the decisions controlling the system can intelligently adapt to changing system states. Although DP has existed since Bellman introduced it in 1957, exact DP policies are only possible for problems with low dimension or under very limiting restrictions. Fortunately, advances in computational power have given rise to approximate DP (ADP). However, most ADP algorithms are still computationally-intractable for high-dimensional problems. This paper specifically considers continuous-state DP problems in which the state variables are multicollinear. The issue of multicollinearity is currently ignored in the ADP literature, but in the statistics community it is well known that high multicollinearity leads to unstable (high variance) parameter estimates in statistical modeling. While not all real world DP applications involve high multicollinearity, it is not uncommon for real cases to involve observed state variables that are correlated, such as the air quality ozone pollution application studied in this research. Correlation is a common occurrence in observed data, including sources in meteorology, energy, finance, manufacturing, health care, etc. ADP algorithms for continuous-state DP achieve an approximate solution through discretization of the state space and model approximations. Typical state space discretizations involve full-dimensional grids or random sampling. The former option requires exponential growth in the number of state points as the state space dimension grows, while the latter option is typically inefficient and requires an intractable number of state points. The exception is computationally-tractable

关键词： Data mining Design and analysis of computer experiments approximate dynamic programming Ozone pollution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：