检索结果-内蒙古大学图书馆

International Conference on Information Fusion

作者： Woosun An Diego Fernando Martinez Ayala David Sidoti Manisha Mishra Xu Han Krishna R. Pattipati Eva D. Regnier David L. Kleinman James A. Hansen Dept. Electrical and Computer Engineering University of Connecticut Connecticut United States Naval Postgraduate School California United States Naval Research Laboratory California United States

ISBN: (纸本)9780982443859

Piracy on the high seas is a problem of world-wide concern. In response to this threat, the US Navy has developed a visualization tool known as the Pirate Attack Risk Surface (PARS) that integrates intelligence data, commercial shipping routes, and meteorological and oceanographic (METOC) information to predict regions where pirates may be present and where they may strike next. This paper proposes an algorithmic augmentation or add-on to PARS that allocates interdiction and surveillance assets so as to minimize the likelihood of a successful pirate attack over a fixed planning horizon. This augmentation, viewed as a tool for human planners, can be mapped closely to the decision support layer of the Battlespace on Demand (BonD) framework [32]. Our solution approach decomposes this NPhard optimization problem into two sequential phases. In Phase I, we solve the problem of allocating only the interdiction assets, such that regions with high cumulative probability of attack over the planning horizon are maximally covered. In Phase II, we solve the surveillance problem, where the area not covered by interdiction assets is partitioned into non-overlapping search regions (e.g., rectangular boxes) and assigned to a set of surveillance assets to maximize the cumulative detection probability over the planning horizon. In order to overcome the curse of dimensionality associated with dynamic programming (DP), we propose a Gauss-Seidel algorithm coupled with a rollout strategy for the interdiction problem. For the surveillance problem, we propose a partitioning algorithm coupled with an asymmetric assignment algorithm for allocating assets to the partitioned regions. Once the surveillance assets are assigned to search regions, the search path for each asset is determined based on a specific search strategy. The proposed algorithms are illustrated using a hypothetical scenario for conducting counterpiracy operations in a given Area of Responsibility (AOR).

关键词： component: Resource management problem Search problem Partitioning algorithm approximate dynamic programming Allocation problem Rollout Gauss-Seidel iteration

来源：评论

学校读者我要写书评

暂无评论

ACCOUNTING RISK IN MULTISTAGE STOCHASTIC PROBLEMS USING approximate dynamic programming

引用

IFAC Proceedings Volumes 2007年第5期40卷 153-158页

作者： Nikolaos E. Pratikakis Matthew J. Realff Jay H. Lee Chemical and Biomolecular Engineering Georgia Institute of Technology311 Ferst Drive Atlanta GA 30332-0100 USA

This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected performance and downside risk represented by (CVαR α ). The modified stage-wise utility function is used within dynamic programming to generate a set of policies representing different levels of the trade-off. The approach is demonstrated in a shortest path optimal control problem and a project management problem modeled as constrained MDP. To address a more complex management problem, we utilize the Real Time approximate dynamic programming algorithm.

关键词： Real Time dynamic programming Risk Markov Decision Problems approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Robust approximate Bilinear programming for Value Function Approximation

引用

JOURNAL OF MACHINE LEARNING RESEARCH 2011年 12卷 3027-3063页

作者： Petrik, Marek Zilberstein, Shlomo IBM Corp Thomas J Watson Res Ctr Yorktown Hts NY 10598 USA Univ Massachusetts Dept Comp Sci Amherst MA 01003 USA

Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.

关键词： value function approximation approximate dynamic programming Markov decision processes

来源：评论

学校读者我要写书评

暂无评论

Adaptive dynamic programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems with ε-Error Bound

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2011年第1期22卷 24-36页

作者： Wang, Fei-Yue Jin, Ning Liu, Derong Wei, Qinglai Chinese Acad Sci Inst Automat Key Lab Complex Syst & Intelligence Sci Beijing 100190 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the optimal control law which makes the performance index function close to the greatest lower bound of all performance indices within an epsilon-error bound. The optimal number of control steps can also be obtained by the proposed ADP algorithms. A convergence analysis of the proposed ADP algorithms in terms of performance index function and control policy is made. In order to facilitate the implementation of the iterative ADP algorithms, neural networks are used for approximating the performance index function, computing the optimal control policy, and modeling the nonlinear system. Finally, two simulation examples are employed to illustrate the applicability of the proposed method.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming learning control neural control neural dynamic programming optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Fast Evaluation of Quadratic Control-Lyapunov Policy

引用

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY 2011年第4期19卷 939-946页

作者： Wang, Yang Boyd, Stephen Stanford Univ Dept Elect Engn Stanford CA 94305 USA

The evaluation of a control-Lyapunov policy, with quadratic Lyapunov function, requires the solution of a quadratic program (QP) at each time step. For small problems this QP can be solved explicitly;for larger problems an online optimization method can be used. For this reason the control-Lyapunov control policy is considered a computationally intensive control law, as opposed to an "analytical" control law, such as conventional linear state feedback, linear quadratic Gaussian control, or H-infinity, too complex or slow to be used in high speed control applications. In this note we show that by precomputing certain quantities, the control-Lyapunov policy can be evaluated extremely efficiently. We will show that when the number of inputs is on the order of the square-root of the state dimension, the cost of evaluating a control-Lyapunov policy is on the same order as the cost of evaluating a simple linear state feedback policy, and less (in order) than the cost of updating a Kalman filter state estimate. To give an idea of the speeds involved, for a problem with 100 states and 10 inputs, the control-Lyapunov policy can be evaluated in around 67 mu s, on a 2 GHz AMD processor;the same processor requires 40 mu s to carry out a Kalman filter update.

关键词： approximate dynamic programming model predictive control (MPC) optimization-based control real-time convex optimization stochastic control

来源：评论

学校读者我要写书评

暂无评论

Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems with Time Delays Based on Heuristic dynamic programming

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS 2011年第12期22卷 1851-1862页

作者： Zhang, Huaguang Song, Ruizhuo Wei, Qinglai Zhang, Tieyan Northeastern Univ Sch Informat Sci & Engn Shenyang 110004 Peoples R China Northeastern Univ Natl Educ Minist Key Lab Integrated Automat Proc Ind Shenyang 110004 Peoples R China Chinese Acad Sci Inst Automat Key Lab Complex Syst & Intelligence Sci Beijing 100190 Peoples R China Shenyang Inst Engn Dept Elect Engn Shenyang 110136 Peoples R China

In this paper, a novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the "backward iteration" is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.

关键词： Adaptive dynamic programming approximate dynamic programming heuristic dynamic programming iteration optimal control time delays

来源：评论

学校读者我要写书评

暂无评论

The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations

引用

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION 2011年第1期22卷 1–19页

作者： Powell, Warren B. Bouzaiene-Ayari, Belgacem Berger, Jean Boukhtouta, Abdeslem George, Abraham P. Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA DRDC Valcartier Quebec City ON Canada

There are a number of sources of randomness that arise in military airlift operations. However, the cost of uncertainty can be difficult to estimate, and is easy to overestimate if we use simplistic decision rules. Using data from Canadian military airlift operations, we study the effect of uncertainty in customer demands as well as aircraft failures, on the overall cost. The system is first analyzed using the types of myopic decision rules widely used in the research literature. The performance of the myopic policy is then compared to the results obtained using robust decisions that account for the uncertainty of future events. These are obtained by modeling the problem as a dynamic program, and solving Bellman's equations using approximate dynamic programming. The experiments show that even approximate solutions to Bellman's equations produce decisions that reduce the cost of uncertainty.

关键词： approximate dynamic programming robust control military logistics

来源：评论

学校读者我要写书评

暂无评论

Linear programming based decomposition methods for inventory distribution systems

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2011年第2期211卷 282-297页

作者： Kunnumkal, Sumit Topaloglu, Huseyin Cornell Univ Sch Operat Res & Informat Engn Ithaca NY 14853 USA Indian Sch Business Hyderabad 500032 Andhra Pradesh India

We consider an inventory distribution system consisting of one warehouse and multiple retailers. The retailers face random demand and are supplied by the warehouse. The warehouse replenishes its stock from an external supplier. The objective is to minimize the total expected replenishment, holding and backlogging cost over a finite planning horizon. The problem can be formulated as a dynamic program, but this dynamic program is difficult to solve due to its high dimensional state variable. It has been observed in the earlier literature that if the warehouse is allowed to ship negative quantities to the retailers, then the problem decomposes by the locations. One way to exploit this observation is to relax the constraints that ensure the nonnegativity of the shipments to the retailers by associating Lagrange multipliers with them, which naturally raises the question of how to choose a good set of Lagrange multipliers. In this paper, we propose efficient methods that choose a good set of Lagrange multipliers by solving linear programming approximations to the inventory distribution problem. Computational experiments indicate that the inventory replenishment policies obtained by our approach can outperform several standard benchmarks by significant margins. (C) 2010 Elsevier B.V. All rights reserved.

关键词： Inventory distribution approximate dynamic programming Inventory control

来源：评论

学校读者我要写书评

暂无评论

A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications

引用

控制理论与应用（英文版） 2011年第3期9卷 336-352页

作者： Warren B.POWELL Department of Operations Research and Financial Engineering Princeton University

We review the literature on approximate dynamic programming,with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and actions and complex information *** build on the literature that has addressed the well-known problem of multidimensional(and possibly continuous) states,and the extensive literature on model-free dynamic programming,which also assumes that the expectation in Bellman’s equation cannot be ***,we point out complications that arise when the actions/controls are vector-valued and possibly *** then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees(with technical assumptions) for both parametric and nonparametric architectures for the value function.

关键词： approximate dynamic programming Reinforcement learning Optimal control Approximation algorithms

来源：评论

学校读者我要写书评

暂无评论

Semi-Markov adaptive critic heuristics with application to airline revenue management

引用

控制理论与应用（英文版） 2011年第3期9卷 421-430页

作者： Ketaki KULKARNI Abhijit GOSAVI Susan MURRAY Katie GRANTHAM Department of Engineering Management and Systems Engineering Missouri University of Science and Technology

The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) *** is one of the ?rst RL and ADP *** and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and *** real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random *** for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the *** of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue *** this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward *** present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability *** test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking *** problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic *** numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.

关键词： Adaptive critics Actor critics Semi-Markov approximate dynamic programming Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：