Piracy on the high seas is a problem of world-wide concern. In response to this threat, the US Navy has developed a visualization tool known as the Pirate Attack Risk Surface (PARS) that integrates intelligence data, ...
详细信息
ISBN:
(纸本)9780982443859
Piracy on the high seas is a problem of world-wide concern. In response to this threat, the US Navy has developed a visualization tool known as the Pirate Attack Risk Surface (PARS) that integrates intelligence data, commercial shipping routes, and meteorological and oceanographic (METOC) information to predict regions where pirates may be present and where they may strike next. This paper proposes an algorithmic augmentation or add-on to PARS that allocates interdiction and surveillance assets so as to minimize the likelihood of a successful pirate attack over a fixed planning horizon. This augmentation, viewed as a tool for human planners, can be mapped closely to the decision support layer of the Battlespace on Demand (BonD) framework [32]. Our solution approach decomposes this NPhard optimization problem into two sequential phases. In Phase I, we solve the problem of allocating only the interdiction assets, such that regions with high cumulative probability of attack over the planning horizon are maximally covered. In Phase II, we solve the surveillance problem, where the area not covered by interdiction assets is partitioned into non-overlapping search regions (e.g., rectangular boxes) and assigned to a set of surveillance assets to maximize the cumulative detection probability over the planning horizon. In order to overcome the curse of dimensionality associated with dynamicprogramming (DP), we propose a Gauss-Seidel algorithm coupled with a rollout strategy for the interdiction problem. For the surveillance problem, we propose a partitioning algorithm coupled with an asymmetric assignment algorithm for allocating assets to the partitioned regions. Once the surveillance assets are assigned to search regions, the search path for each asset is determined based on a specific search strategy. The proposed algorithms are illustrated using a hypothetical scenario for conducting counterpiracy operations in a given Area of Responsibility (AOR).
This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected perfor...
详细信息
This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected performance and downside risk represented by (CVαR α ). The modified stage-wise utility function is used within dynamicprogramming to generate a set of policies representing different levels of the trade-off. The approach is demonstrated in a shortest path optimal control problem and a project management problem modeled as constrained MDP. To address a more complex management problem, we utilize the Real Time approximate dynamic programming algorithm.
Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulati...
详细信息
Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.
In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamicprogramming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the...
详细信息
In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamicprogramming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the optimal control law which makes the performance index function close to the greatest lower bound of all performance indices within an epsilon-error bound. The optimal number of control steps can also be obtained by the proposed ADP algorithms. A convergence analysis of the proposed ADP algorithms in terms of performance index function and control policy is made. In order to facilitate the implementation of the iterative ADP algorithms, neural networks are used for approximating the performance index function, computing the optimal control policy, and modeling the nonlinear system. Finally, two simulation examples are employed to illustrate the applicability of the proposed method.
The evaluation of a control-Lyapunov policy, with quadratic Lyapunov function, requires the solution of a quadratic program (QP) at each time step. For small problems this QP can be solved explicitly;for larger proble...
详细信息
The evaluation of a control-Lyapunov policy, with quadratic Lyapunov function, requires the solution of a quadratic program (QP) at each time step. For small problems this QP can be solved explicitly;for larger problems an online optimization method can be used. For this reason the control-Lyapunov control policy is considered a computationally intensive control law, as opposed to an "analytical" control law, such as conventional linear state feedback, linear quadratic Gaussian control, or H-infinity, too complex or slow to be used in high speed control applications. In this note we show that by precomputing certain quantities, the control-Lyapunov policy can be evaluated extremely efficiently. We will show that when the number of inputs is on the order of the square-root of the state dimension, the cost of evaluating a control-Lyapunov policy is on the same order as the cost of evaluating a simple linear state feedback policy, and less (in order) than the cost of updating a Kalman filter state estimate. To give an idea of the speeds involved, for a problem with 100 states and 10 inputs, the control-Lyapunov policy can be evaluated in around 67 mu s, on a 2 GHz AMD processor;the same processor requires 40 mu s to carry out a Kalman filter update.
In this paper, a novel heuristic dynamicprogramming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algor...
详细信息
In this paper, a novel heuristic dynamicprogramming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the "backward iteration" is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.
There are a number of sources of randomness that arise in military airlift operations. However, the cost of uncertainty can be difficult to estimate, and is easy to overestimate if we use simplistic decision rules. Us...
详细信息
There are a number of sources of randomness that arise in military airlift operations. However, the cost of uncertainty can be difficult to estimate, and is easy to overestimate if we use simplistic decision rules. Using data from Canadian military airlift operations, we study the effect of uncertainty in customer demands as well as aircraft failures, on the overall cost. The system is first analyzed using the types of myopic decision rules widely used in the research literature. The performance of the myopic policy is then compared to the results obtained using robust decisions that account for the uncertainty of future events. These are obtained by modeling the problem as a dynamic program, and solving Bellman's equations using approximate dynamic programming. The experiments show that even approximate solutions to Bellman's equations produce decisions that reduce the cost of uncertainty.
We consider an inventory distribution system consisting of one warehouse and multiple retailers. The retailers face random demand and are supplied by the warehouse. The warehouse replenishes its stock from an external...
详细信息
We consider an inventory distribution system consisting of one warehouse and multiple retailers. The retailers face random demand and are supplied by the warehouse. The warehouse replenishes its stock from an external supplier. The objective is to minimize the total expected replenishment, holding and backlogging cost over a finite planning horizon. The problem can be formulated as a dynamic program, but this dynamic program is difficult to solve due to its high dimensional state variable. It has been observed in the earlier literature that if the warehouse is allowed to ship negative quantities to the retailers, then the problem decomposes by the locations. One way to exploit this observation is to relax the constraints that ensure the nonnegativity of the shipments to the retailers by associating Lagrange multipliers with them, which naturally raises the question of how to choose a good set of Lagrange multipliers. In this paper, we propose efficient methods that choose a good set of Lagrange multipliers by solving linear programming approximations to the inventory distribution problem. Computational experiments indicate that the inventory replenishment policies obtained by our approach can outperform several standard benchmarks by significant margins. (C) 2010 Elsevier B.V. All rights reserved.
We review the literature on approximate dynamic programming,with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and action...
详细信息
We review the literature on approximate dynamic programming,with the goal of better understanding the theory behind practical algorithms for solving dynamic programs with continuous and vector-valued states and actions and complex information *** build on the literature that has addressed the well-known problem of multidimensional(and possibly continuous) states,and the extensive literature on model-free dynamicprogramming,which also assumes that the expectation in Bellman’s equation cannot be ***,we point out complications that arise when the actions/controls are vector-valued and possibly *** then describe some recent research by the authors on approximate policy iteration algorithms that offer convergence guarantees(with technical assumptions) for both parametric and nonparametric architectures for the value function.
The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) *** is one of the ?rst RL and ADP *** and ADP algorithms are particularly useful for so...
详细信息
The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) *** is one of the ?rst RL and ADP *** and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and *** real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random *** for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the *** of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue *** this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward *** present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability *** test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking *** problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic *** numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.
暂无评论