检索结果-内蒙古大学图书馆

Performance Guarantee of a Sub-Optimal Policy for a Robotic Surveillance Application *

IFAC Proceedings Volumes 2013年第30期46卷 283-290页

作者： Myoungkuk Park Krishnamoorthy Kalyanam Swaroop Darbha P.P. Khargonekar P.R. Chandler M. Pachter Department of Mechanical Engineering Texas A&M University College Station TX 77843 USA Infoscitex Corporation Dayton OH 45431 USA Department of Electrical Engineering University of Florida Gainesville FL 32525 Autonomous Control Branch Air Force Research Laboratory Wright-Patterson A.F.B. OH 45433 Department of Electrical Engineering Air Force Institute of Technology Wright-Patterson A.F.B. OH 45433

This paper focuses on the development and analysis of sub-optimal decision algorithms for a collection of robots that assist a remotely located operator in perimeter surveillance. The operator is tasked with the classification of an incursion across the perimeter. Whenever there is an incursion into the perimeter, an Unattended Ground Sensor (UGS) in the vicinity, raises an alert. A robot services the alert by visiting the alert location, collecting evidence in the form of video and other imagery, and transmitting them to the operator. The accuracy of operator's classification depends on the volume and freshness of information gathered by the robots. There are two competing needs for a robot: it needs to spend adequate time at an alert location to collect evidence for aiding the operator but it also needs to service the alert as soon as possible, so that the evidence collected is relevant. The control problem is to determine the optimal amount of time a robot must spend servicing an alert. The incursions are stochastic and their statistics are assumed to be known. This problem is posed as a Markov Decision Problem. However, even for two robots and five UGS locations, the number of states is of the order of millions. approximate dynamic programming (ADP) via Linear programming (LP) provides a way to approximate the value function and provide bounds on its sub-optimality. The novel feature of this paper is to present a lower bound via LP based techniques and state partitioning and construct a sub-optimal policy whose performance betters the lower bound. An illustrative perimeter surveillance example corroborates the results presented in this paper.

关键词： Stochastic control approximate dynamic programming Linear programming Robotic Surveillance Perimeter Patrol

来源：评论

学校读者我要写书评

暂无评论

Online Partially Model-Free Solution of Two-Player Zero Sum Differential Games

引用

IFAC Proceedings Volumes 2013年第32期46卷 696-701页

作者： P Praveen Shubhendu Bhasin Department of Electrical Engineering Indian Institute of Technology Delhi India

An online adaptive dynamic programming based iterative algorithm is proposed for a two-player zero sum linear differential game problem arising in the control of process systems affected by disturbances. The objective in such a scenario is to obtain an optimal control policy that minimizes the specified performance index or cost function in presence of worst case disturbance. Conventional algorithms for the solution of such problems require full knowledge of system dynamics. The algorithm proposed in this paper is partially model-free and solves the two-player zero sum linear differential game problem without knowledge of state and control input matrices.

关键词： Two-player zero sum differential game Adaptive dynamic programming approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A Data-driven Model for Large Wildfire Behaviour Prediction in Europe

引用

Procedia Computer Science 2013年 18卷 1861-1870页

作者： Dario Rodriguez-Aseretto Daniele de Rigo Margherita Di Leo Ana Cortés Jesús San-Miguel-Ayanz European Commission Joint Research Centre Institute for Environment and Sustainability Via E. Fermi 2749 I-21027 Ispra (VA) Italy Politecnico di Milano Dipartimento di Elettronica e Informazione Via Ponzio 34/5 I-20133 Milano Italy Universitat Autonoma de Barcelona Computer Architecture and Operating Systems Campus Bellaterra Cerdanyola 08193 Spain

The European Forest Fire Information System (EFFIS) has been established by the Joint Research Centre (JRC) and the Directorate General for Environment (DG ENV) of the European Commission (EC) in close collaboration with the Member States and neighbour countries. EFFIS is intended as complementary system to national and regional systems in the countries, providing harmonised information required for international collaboration on forest fire prevention and fighting and in cases of trans-boundary fire events. However, one missing component in the system is a wildfire behaviour model able to cover the whole Europe. We propose a new general conceptualisation for wildfire prediction. It relies on an array-based and semantically enhanced (Semantic Array programming) application of the dynamic Data Driven Application Systems (DDDAS) concept, so as to predict spread of large fires at European level. The proposed mathematical framework is designed to simulate with an ensemble strategy the wildfire dynamics under given sequences of actions for controlling the fire spread and updated data- driven information. First results on data and software uncertainties associated with the problem have been presented with a real case study in Spain.

关键词： dynamic Data Driven Application Systems Forest Fires Partial Open Loop Feedback Control approximate dynamic programming Semantic Array programming

来源：评论

学校读者我要写书评

暂无评论

A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VALUE FUNCTIONS IN STOCHASTIC CONTROL

A UNIFIED FRAMEWORK FOR LINEAR FUNCTION APPROXIMATION OF VAL...

引用

European Signal Processing Conference

作者： Matilde Sanchez-Fernandez Sergio Valcarcel Santiago Zazoy Universidad Carlos III de Madrid Signal Theory & Communictions Dept. Universidad Politecnica de Madrid Signals Systems & Radiocommunications Dept. Av. Complutense Universidad Politecnica de Madrid Signals Systems & Radiocommunications Dept. Av. Complutense

This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.

关键词： approximate dynamic programming Linear value function approximation Mean squared Bellman Error Mean squared projected Bellman Error Reinforcement Learning

来源：评论

学校读者我要写书评

暂无评论

On Integral Value Iteration for Continuous-Time Linear Systems

On Integral Value Iteration for Continuous-Time Linear Syste...

引用

American Control Conference

作者： Jae Young Lee Jin Bae Park Yoon Ho Choi Department of Electrical and Electronic Engineering Yonsei University Shinchon-Dong Seodaemum-Gu Seoul 120-749 Korea Department of Electronic Engineering Kyonggi University Suwon Kyonggi-Do 443-760 Korea

ISBN: (纸本)9781479901777

This paper investigates the properties of integral value iteration (I-VI) which is one of the reinforcement learning (RL) technique for solving online the continuous-time (CT) optimal control problems without using the system drift dynamics. The target I-VI is the one applied to CT linear quadratic regulation problems. As a result, two modes of global monotone convergence of I-VI are presented. One behaves like policy iteration (PI) (PI-mode of convergence) and the other is named VI-mode of convergence. All of the other properties-positive definiteness, stability, and relation between I-VI and integral PI-are presented within these two frameworks. Finally, numerical simulations are carried out to verify and further investigate these properties.

关键词： value iteration LQR reinforcement learning monotone convergence approximate dynamic programming monotone convergence linear quadratic regulators learning (artificial intelligence) iterative methods dynamic programming integration Converge Learning

来源：评论

学校读者我要写书评

暂无评论

Lagrangian relaxation and constraint generation for allocation and advanced scheduling

引用

COMPUTERS & OPERATIONS RESEARCH 2012年第10期39卷 2323-2336页

作者： Gocgun, Yasin Ghate, Archis Univ Washington Seattle WA 98195 USA Univ British Columbia Sauder Sch Business Vancouver BC V5Z 1M9 Canada

Diverse applications in manufacturing, logistics, health care, telecommunications, and computing require that renewable resources be dynamically scheduled to handle distinct classes of job service requests arriving randomly over slotted time. These dynamic stochastic resource scheduling problems are analytically and computationally intractable even when the number of job classes is relatively small. In this paper, we formally introduce two types of problems called allocation and advanced scheduling, and formulate their Markov decision process (MDP) models. We establish that these MDPs are "weakly coupled" and exploit this structural property to develop an approximate dynamic programming method that uses Lagrangian relaxation and constraint generation to efficiently make good scheduling decisions. In fact, our method is presented for a general class of large-scale weakly coupled MDPs that we precisely define. Extensive computational experiments on hundreds of randomly generated test problems reveal that Lagrangian decisions outperform myopic decisions with a statistically significant margin. The relative benefit of Lagrangian decisions is much higher for advanced scheduling than for allocation scheduling. (C) 2011 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Resource allocation Scheduling

来源：评论

学校读者我要写书评

暂无评论

Metamodeling and the Critic-based approach to multi-level optimization

引用

NEURAL NETWORKS 2012年第Aug.期32卷 179-185页

作者： Werbos, Ludmilla Kozma, Robert Silva-Lugo, Rodrigo Pazienza, Giovanni E. Werbos, Paul J. IntControl LLC Memphis TN 38152 USA Univ Memphis CLION Memphis TN 38152 USA Natl Sci Fdn Arlington VA 22230 USA

Large-scale networks with hundreds of thousands of variables and constraints are becoming more and more common in logistics, communications, and distribution domains. Traditionally, the utility functions defined on such networks are optimized using some variation of Linear programming, such as Mixed Integer programming (MIP). Despite enormous progress both in hardware (multiprocessor systems and specialized processors) and software (Gurobi) we are reaching the limits of what these tools can handle in real time. Modern logistic problems, for example, call for expanding the problem both vertically (from one day up to several days) and horizontally (combining separate solution stages into an integrated model). The complexity of such integrated models calls for alternative methods of solution, such as approximate dynamic programming (ADP), which provide a further increase in the performance necessary for the daily operation. In this paper, we present the theoretical basis and related experiments for solving the multistage decision problems based on the results obtained for shorter periods, as building blocks for the models and the solution, via Critic-Model-Action cycles, where various types of neural networks are combined with traditional MIP models in a unified optimization system. In this system architecture, fast and simple feed-forward networks are trained to reasonably initialize more complicated recurrent networks, which serve as approximators of the value function (Critic). The combination of interrelated neural networks and optimization modules allows for multiple queries for the same system, providing flexibility and optimizing performance for large-scale real-life problems. A MATLAB implementation of our solution procedure for a realistic set of data and constraints shows promising results, compared to the iterative MIP approach. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Multi-level optimization Metamodeling Adaptive critic

来源：评论

学校读者我要写书评

暂无评论

Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach

引用

NEUROCOMPUTING 2012年第1期78卷 14-22页

作者： Wang, Ding Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat State Key Lab Intelligent Control & Management Co Beijing 100190 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this paper, a finite-horizon neuro-optimal tracking control strategy for a class of discrete-time nonlinear systems is proposed. Through system transformation, the optimal tracking problem is converted into designing a finite-horizon optimal regulator for the tracking error dynamics. Then, with convergence analysis in terms of cost function and control law, the iterative adaptive dynamic programming (ADP) algorithm via heuristic dynamic programming (HDP) technique is introduced to obtain the finite-horizon optimal tracking controller which makes the cost function close to its optimal value within an epsilon-error bound. Three neural networks are used as parametric structures to implement the algorithm, which aims at approximating the cost function, the control law, and the error dynamics, respectively. Two simulation examples are included to complement the theoretical discussions. (C) 2011 Elsevier B.V. All rights reserved.

关键词： Adaptive critic designs Adaptive dynamic programming approximate dynamic programming Finite-horizon optimal tracking control Learning control Neural networks Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Constrained adaptive optimal control using a reinforcement learning agent

引用

AUTOMATICA 2012年第10期48卷 2614-2619页

作者： Lin, Wei-Song Zheng, Chen-Hong NTUEE Taipei 106 Taiwan Natl Taiwan Univ Dept Elect Engn Taipei Taiwan

To synthesize the optimal control strategies of nonlinear systems on infinite horizon while subject to mixed equality and inequality constraints has been a challenge to control engineers. This paper regards it as a problem of finite-time optimization in infinite-horizon control then devises a reinforcement learning agent, termed as the Adaptive Optimal Control (AOC) agent, to carry out the finite-time optimization procedures. Adaptive optimal control is in the sense of activating the finite-time optimization procedure whenever needed to improve the control strategy or adapt to a real-world environment. The Nonlinear Quadratic Regulator (NQR) is shown a typical example that the AOC agent can find out. The optimality conditions and adaptation rules for the AOC agent are deduced from Pontryagin's minimum principle. The requirements for convergence and stability of the AOC system are shown. (C) 2012 Elsevier Ltd. All rights reserved.

关键词： Adaptive optimal control Reinforcement learning Constrained optimization approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

A least squares temporal difference actor-critic algorithm with applications to warehouse management

引用

NAVAL RESEARCH LOGISTICS 2012年第3-4期59卷 197-211页

作者： Estanjini, Reza Moazzez Li, Keyong Paschalidis, Ioannis Ch Boston Univ Dept Elect & Comp Engn Div Syst Engn Boston MA 02215 USA Boston Univ Ctr Informat & Syst Engn Boston MA 02215 USA

This article develops a new approximate dynamic programming (DP) algorithm for Markov decision problems and applies it to a vehicle dispatching problem arising in warehouse management. The algorithm is of the actor-critic type and uses a least squares temporal difference learning method. It operates on a sample-path of the system and optimizes the policy within a prespecified class parameterized by a parsimonious set of parameters. The method is applicable to a partially observable Markov decision process setting where the measurements of state variables are potentially corrupted, and the cost is only observed through the imperfect state observations. We show that under reasonable assumptions, the algorithm converges to a locally optimal parameter set. We also show that the imperfect cost observations do not affect the policy and the algorithm minimizes the true expected cost. In the warehouse application, the problem is to dispatch sensor-equipped forklifts in order to minimize operating costs involving product movement delays and forklift maintenance. We consider instances where standard DP is computationally intractable. Simulation results confirm the theoretical claims of the article and show that our algorithm converges more smoothly than earlier actorcritic algorithms while substantially outperforming heuristics used in practice. (c) 2012 Wiley Periodicals, Inc. Naval Research Logistics, 2012

关键词： Markov decision processes partial observability approximate dynamic programming actor-critic algorithms warehouse management vehicle routing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：