检索结果-内蒙古大学图书馆

A New Self-Learning Optimal Control Scheme for Discrete-Time Nonlinear Systems Using Policy Iterative Adaptive dynamic programming

引用

IFAC Proceedings Volumes 2013年第20期46卷 580-585页

作者： Qinglai Wei Derong Liu The State Key Laboratory of Management and Control for Complex Systems Institute of Automation Chinese Academy of Sciences Beijing 100190 China

In this paper, a new self-learning method using policy iterative adaptive dynamic programming (ADP) is developed to obtain the optimal control scheme of discrete-time nonlinear systems. The iterative ADP algorithm permits an arbitrary admissible control law to initialize the iterative algorithm. It is the first time that the properties of the policy iterative ADP are established for the discrete-time situation. It proves that the iterative performance index function is non-increasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman (HJB) equation. It also proves that any of the iterative control policy can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, a simulation example is given to illustrate the performance of the present method.

关键词： Adaptive dynamic programming approximate dynamic programming Nonlinear System Optimal Control Policy Iteration Neural Networks Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

Finite-Horizon Control-Constrained Nonlinear Optimal Control Using Single Network Adaptive Critics

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013年第1期24卷 145-157页

作者： Heydari, Ali Balakrishnan, Sivasubramanya N. Missouri Univ Sci & Technol Dept Mech & Aerosp Engn Rolla MO 65401 USA

To synthesize fixed-final-time control-constrained optimal controllers for discrete-time nonlinear control-affine systems, a single neural network (NN)-based controller called the Finite-horizon Single Network Adaptive Critic is developed in this paper. Inputs to the NN are the current system states and the time-to-go, and the network outputs are the costates that are used to compute optimal feedback control. Control constraints are handled through a nonquadratic cost function. Convergence proofs of: 1) the reinforcement learning-based training method to the optimal solution;2) the training error;and 3) the network weights are provided. The resulting controller is shown to solve the associated time-varying Hamilton-Jacobi-Bellman equation and provide the fixed-final-time optimal solution. Performance of the new synthesis technique is demonstrated through different examples including an attitude control problem wherein a rigid spacecraft performs a finite-time attitude maneuver subject to control bounds. The new formulation has great potential for implementation since it consists of only one NN with single set of weights and it provides comprehensive feedback solutions online, though it is trained offline.

关键词： approximate dynamic programming finite-horizon optimal control fixed-final-time optimal control input-constraint neural networks

来源：评论

学校读者我要写书评

暂无评论

On the Convergence of Simulation-based Iterative Methods for Solving Singular Linear Systems

引用

Stochastic Systems 2013年第1期3卷 1-321页

作者： Mengdi Wang Dimitri P. Bertsekas

We consider the simulation-based solution of linear systems of equations, Ax = b , of various types frequently arising in large-scale applications, where A is singular. We show that the convergence properties of iterative solution methods are frequently lost when they are implemented with simulation (e.g., using sample average approximation), as is often done in important classes of large-scale problems. We focus on special cases of algorithms for singular systems, including some arising in least squares problems and approximate dynamic programming, where convergence of the residual sequence { Ax k − b } may be obtained, while the sequence of iterates { x k } may diverge. For some of these special cases, under additional assumptions, we show that the iterate sequence is guaranteed to converge. For situations where the iterates diverge but the residuals converge to zero, we propose schemes for extracting from the divergent sequence another sequence that converges to a solution of Ax = b .

关键词： Stochastic algorithm singular system Monte-Carlo estimation simulation proximal method regularization approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems

引用

IEEE TRANSACTIONS ON CYBERNETICS 2013年第2期43卷 779-789页

作者： Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite-horizon discrete-time nonlinear systems with finite approximation errors. The idea is to use an iterative ADP algorithm to obtain the iterative control law that makes the iterative performance index function reach the optimum. When the iterative control law and the iterative performance index function in each iteration cannot be accurately obtained, the convergence conditions of the iterative ADP algorithm are obtained. When convergence conditions are satisfied, it is shown that the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance index functions under some mild assumptions. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

关键词： Adaptive dynamic programming (ADP) approximate dynamic programming finite approximation errors neural networks optimal control

来源：评论

学校读者我要写书评

暂无评论

Performance Guarantee of a Sub-Optimal Policy for a Robotic Surveillance Application *

引用

IFAC Proceedings Volumes 2013年第30期46卷 283-290页

作者： Myoungkuk Park Krishnamoorthy Kalyanam Swaroop Darbha P.P. Khargonekar P.R. Chandler M. Pachter Department of Mechanical Engineering Texas A&M University College Station TX 77843 USA Infoscitex Corporation Dayton OH 45431 USA Department of Electrical Engineering University of Florida Gainesville FL 32525 Autonomous Control Branch Air Force Research Laboratory Wright-Patterson A.F.B. OH 45433 Department of Electrical Engineering Air Force Institute of Technology Wright-Patterson A.F.B. OH 45433

This paper focuses on the development and analysis of sub-optimal decision algorithms for a collection of robots that assist a remotely located operator in perimeter surveillance. The operator is tasked with the classification of an incursion across the perimeter. Whenever there is an incursion into the perimeter, an Unattended Ground Sensor (UGS) in the vicinity, raises an alert. A robot services the alert by visiting the alert location, collecting evidence in the form of video and other imagery, and transmitting them to the operator. The accuracy of operator's classification depends on the volume and freshness of information gathered by the robots. There are two competing needs for a robot: it needs to spend adequate time at an alert location to collect evidence for aiding the operator but it also needs to service the alert as soon as possible, so that the evidence collected is relevant. The control problem is to determine the optimal amount of time a robot must spend servicing an alert. The incursions are stochastic and their statistics are assumed to be known. This problem is posed as a Markov Decision Problem. However, even for two robots and five UGS locations, the number of states is of the order of millions. approximate dynamic programming (ADP) via Linear programming (LP) provides a way to approximate the value function and provide bounds on its sub-optimality. The novel feature of this paper is to present a lower bound via LP based techniques and state partitioning and construct a sub-optimal policy whose performance betters the lower bound. An illustrative perimeter surveillance example corroborates the results presented in this paper.

关键词： Stochastic control approximate dynamic programming Linear programming Robotic Surveillance Perimeter Patrol

来源：评论

学校读者我要写书评

暂无评论

Online Partially Model-Free Solution of Two-Player Zero Sum Differential Games

引用

IFAC Proceedings Volumes 2013年第32期46卷 696-701页

作者： P Praveen Shubhendu Bhasin Department of Electrical Engineering Indian Institute of Technology Delhi India

An online adaptive dynamic programming based iterative algorithm is proposed for a two-player zero sum linear differential game problem arising in the control of process systems affected by disturbances. The objective in such a scenario is to obtain an optimal control policy that minimizes the specified performance index or cost function in presence of worst case disturbance. Conventional algorithms for the solution of such problems require full knowledge of system dynamics. The algorithm proposed in this paper is partially model-free and solves the two-player zero sum linear differential game problem without knowledge of state and control input matrices.

关键词： Two-player zero sum differential game Adaptive dynamic programming approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Data-driven Model for Large Wildfire Behaviour Prediction in Europe

引用

Procedia Computer Science 2013年 18卷 1861-1870页

作者： Dario Rodriguez-Aseretto Daniele de Rigo Margherita Di Leo Ana Cortés Jesús San-Miguel-Ayanz European Commission Joint Research Centre Institute for Environment and Sustainability Via E. Fermi 2749 I-21027 Ispra (VA) Italy Politecnico di Milano Dipartimento di Elettronica e Informazione Via Ponzio 34/5 I-20133 Milano Italy Universitat Autonoma de Barcelona Computer Architecture and Operating Systems Campus Bellaterra Cerdanyola 08193 Spain

The European Forest Fire Information System (EFFIS) has been established by the Joint Research Centre (JRC) and the Directorate General for Environment (DG ENV) of the European Commission (EC) in close collaboration with the Member States and neighbour countries. EFFIS is intended as complementary system to national and regional systems in the countries, providing harmonised information required for international collaboration on forest fire prevention and fighting and in cases of trans-boundary fire events. However, one missing component in the system is a wildfire behaviour model able to cover the whole Europe. We propose a new general conceptualisation for wildfire prediction. It relies on an array-based and semantically enhanced (Semantic Array programming) application of the dynamic Data Driven Application Systems (DDDAS) concept, so as to predict spread of large fires at European level. The proposed mathematical framework is designed to simulate with an ensemble strategy the wildfire dynamics under given sequences of actions for controlling the fire spread and updated data- driven information. First results on data and software uncertainties associated with the problem have been presented with a real case study in Spain.

关键词： dynamic Data Driven Application Systems Forest Fires Partial Open Loop Feedback Control approximate dynamic programming Semantic Array programming

来源：评论

学校读者我要写书评

暂无评论

A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VALUE FUNCTIONS IN STOCHASTIC CONTROL

A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VAL...

引用

European Signal Processing Conference

作者： Matilde Sanchez-Fernandez Sergio Valcarcel Santiago Zazoy Universidad Carlos III de Madrid Signal Theory & Communictions Dept. Universidad Politecnica de Madrid Signals Systems & Radiocommunications Dept. Av. Complutense Universidad Politecnica de Madrid Signals Systems & Radiocommunications Dept. Av. Complutense

This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.

关键词： approximate dynamic programming Linear value function approximation Mean squared Bellman Error Mean squared projected Bellman Error Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

On Integral Value Iteration for Continuous-Time Linear Systems

On Integral Value Iteration for Continuous-Time Linear Syste...

引用

American Control Conference

作者： Jae Young Lee Jin Bae Park Yoon Ho Choi Department of Electrical and Electronic Engineering Yonsei University Shinchon-Dong Seodaemum-Gu Seoul 120-749 Korea Department of Electronic Engineering Kyonggi University Suwon Kyonggi-Do 443-760 Korea

ISBN: (纸本)9781479901777

This paper investigates the properties of integral value iteration (I-VI) which is one of the reinforcement learning (RL) technique for solving online the continuous-time (CT) optimal control problems without using the system drift dynamics. The target I-VI is the one applied to CT linear quadratic regulation problems. As a result, two modes of global monotone convergence of I-VI are presented. One behaves like policy iteration (PI) (PI-mode of convergence) and the other is named VI-mode of convergence. All of the other properties-positive definiteness, stability, and relation between I-VI and integral PI-are presented within these two frameworks. Finally, numerical simulations are carried out to verify and further investigate these properties.

关键词： value iteration LQR reinforcement learning monotone convergence approximate dynamic programming monotone convergence linear quadratic regulators learning (artificial intelligence) iterative methods dynamic programming integration Converge Learning

来源：评论

学校读者我要写书评

暂无评论

Lagrangian relaxation and constraint generation for allocation and advanced scheduling

引用

COMPUTERS & OPERATIONS RESEARCH 2012年第10期39卷 2323-2336页

作者： Gocgun, Yasin Ghate, Archis Univ Washington Seattle WA 98195 USA Univ British Columbia Sauder Sch Business Vancouver BC V5Z 1M9 Canada

Diverse applications in manufacturing, logistics, health care, telecommunications, and computing require that renewable resources be dynamically scheduled to handle distinct classes of job service requests arriving randomly over slotted time. These dynamic stochastic resource scheduling problems are analytically and computationally intractable even when the number of job classes is relatively small. In this paper, we formally introduce two types of problems called allocation and advanced scheduling, and formulate their Markov decision process (MDP) models. We establish that these MDPs are "weakly coupled" and exploit this structural property to develop an approximate dynamic programming method that uses Lagrangian relaxation and constraint generation to efficiently make good scheduling decisions. In fact, our method is presented for a general class of large-scale weakly coupled MDPs that we precisely define. Extensive computational experiments on hundreds of randomly generated test problems reveal that Lagrangian decisions outperform myopic decisions with a statistically significant margin. The relative benefit of Lagrangian decisions is much higher for advanced scheduling than for allocation scheduling. (C) 2011 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Resource allocation Scheduling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：