Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulati...
详细信息
Value function approximation methods have been successfully used in many applications, but the prevailing techniques often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this worst-case complexity is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze the formulation as well as a simple approximate algorithm for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems.
A combined kinematic/torque control law is developed by using a backstepping design approach for a nonholonomic mobile robot with two driving wheels mounted on the same axis to track a reference trajectory. The auxili...
详细信息
A combined kinematic/torque control law is developed by using a backstepping design approach for a nonholonomic mobile robot with two driving wheels mounted on the same axis to track a reference trajectory. The auxiliary velocity control inputs are designed for the kinematic steering system to make the posture error asymptotically stable. Next, a computed-torque controller is designed such that the mobile robot's velocities converge on the given velocity inputs in an optimal manner by converting the tracking control problem into the regulation problem whereby the uncertainties in the dynamics of mobile robots are considered. The proposed online and forward-in-time policy iteration (PI) algorithm based on approximate dynamic programming (ADP) is used to solve the optimal control problem with unknown internal dynamics by using single neural networks (NNs) to approximate the cost function. Afterwards, the near-optimal control policy can be computed directly according to the cost function, which removes the action network appearing in the ordinary ADP method. The stability of the dynamical extension system is demonstrated using Lyapunov methods. The simulation results are provided to demonstrate the effectiveness of the proposed approach.
In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic dynamicprogramming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB...
详细信息
ISBN:
(纸本)9781424407064
In this paper, a greedy iteration scheme based on approximate dynamic programming (ADP), namely Heuristic dynamicprogramming (HDP), is used to solve for the value function of the Hamilton Jacobi Bellman equation (HJB) that appears in discrete-time (DT) nonlinear optimal control. Two neural networks are used- one to approximate the value function and one to approximate the optimal control action. The importance of ADP is that it allows one to solve the HJB equation for general nonlinear discrete-time systems by using a neural network to approximate the value function. The importance of this paper is that the proof of convergence of the HDP iteration scheme is provided using rigorous methods for general discrete-time nonlinear systems with continuous state and action spaces. Two examples are provided in this paper. The first example is a linear system, where ADP is found to converge to the correct solution of the Algebraic Riccati equation (ARE). The second example considers a nonlinear control system.
This paper introduces a new concept called a Virtual Generator (VG). VGs are simplified representations of groups of coherent synchronous generators in a power system. They resemble commonly used power system dynamic ...
详细信息
ISBN:
(纸本)9781467327275
This paper introduces a new concept called a Virtual Generator (VG). VGs are simplified representations of groups of coherent synchronous generators in a power system. They resemble commonly used power system dynamic equivalents obtained via generator aggregation techniques. Traditionally power system dynamic equivalents are developed offline, fixed, and used to replace large portions of the system that are considered external to the portion of the system being analyzed in detail. In contrast, VGs are calculated online, are not limited to representing external areas of the system being analyzed/controlled, and do not replace any portion of the power system. Instead, they allow wide-area damping controllers (WADCs) to exploit the realization that a group of coherent synchronous generators in a power system can be controlled as a single generating unit for achieving wide-area damping control objectives. The implementation of VGs is made possible by the availability of Wide-Area Measurements (WAMs) from Phasor Measurement Units (PMUs). To the authors' knowledge, this is the first time that the use of power system equivalencing techniques has been extended to real-time WADC. Simulation studies carried out on the 68-bus New England/New York power system demonstrate that intelligent controllers developed using VGs can significantly improve the stability of a power system by effectively damping low-frequency interarea oscillations.
Satisficing is an efficient strategy for applying existing knowledge in a complex, constrained, environment. We present a set of agent-based simulations that demonstrate a higher payoff for satisficing strategies than...
详细信息
ISBN:
(纸本)9781467327428
Satisficing is an efficient strategy for applying existing knowledge in a complex, constrained, environment. We present a set of agent-based simulations that demonstrate a higher payoff for satisficing strategies than for exploring strategies when using approximate dynamic programming methods for learning complex environments. In our constrained learning environment, satisficing agents outperformed exploring agent by approximately six percent, in terms of the number of tasks completed.
The adaptive dynamicprogramming(ADP) approach is employed to design an optimal controller for unknown discrete-time nonlinear systems with control ***,a neural network is constructed to identify the unknown dynamical...
详细信息
The adaptive dynamicprogramming(ADP) approach is employed to design an optimal controller for unknown discrete-time nonlinear systems with control ***,a neural network is constructed to identify the unknown dynamical system with stability ***,the iterative ADP algorithm is developed to solve the optimal control problem with convergence ***,two other neural networks are introduced to approximate the cost function and its derivative and the control law,under the framework of globalized dual heuristic programming ***,two simulation examples are included to verify the theoretical results.
In this paper,we propose a novel adaptive dynamicprogramming(ADP) scheme based on general value iteration to obtain near optimal control for discrete-time nonlinear systems with continuous state and control ***,the s...
详细信息
In this paper,we propose a novel adaptive dynamicprogramming(ADP) scheme based on general value iteration to obtain near optimal control for discrete-time nonlinear systems with continuous state and control ***,the selection of initial value function is different from the traditional value iteration,and a new method is introduced to demonstrate the convergence property and convergence speed of the value ***,the control law obtained at each iteration can stabilize the system under some *** last,three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown nonlinear system,the value function and the optimal control *** simulation example is presented to demonstrate the effectiveness of the present scheme.
Piracy on the high seas is a problem of world-wide concern. In response to this threat, the US Navy has developed a visualization tool known as the Pirate Attack Risk Surface (PARS) that integrates intelligence data, ...
详细信息
ISBN:
(纸本)9780982443859
Piracy on the high seas is a problem of world-wide concern. In response to this threat, the US Navy has developed a visualization tool known as the Pirate Attack Risk Surface (PARS) that integrates intelligence data, commercial shipping routes, and meteorological and oceanographic (METOC) information to predict regions where pirates may be present and where they may strike next. This paper proposes an algorithmic augmentation or add-on to PARS that allocates interdiction and surveillance assets so as to minimize the likelihood of a successful pirate attack over a fixed planning horizon. This augmentation, viewed as a tool for human planners, can be mapped closely to the decision support layer of the Battlespace on Demand (BonD) framework [32]. Our solution approach decomposes this NPhard optimization problem into two sequential phases. In Phase I, we solve the problem of allocating only the interdiction assets, such that regions with high cumulative probability of attack over the planning horizon are maximally covered. In Phase II, we solve the surveillance problem, where the area not covered by interdiction assets is partitioned into non-overlapping search regions (e.g., rectangular boxes) and assigned to a set of surveillance assets to maximize the cumulative detection probability over the planning horizon. In order to overcome the curse of dimensionality associated with dynamicprogramming (DP), we propose a Gauss-Seidel algorithm coupled with a rollout strategy for the interdiction problem. For the surveillance problem, we propose a partitioning algorithm coupled with an asymmetric assignment algorithm for allocating assets to the partitioned regions. Once the surveillance assets are assigned to search regions, the search path for each asset is determined based on a specific search strategy. The proposed algorithms are illustrated using a hypothetical scenario for conducting counterpiracy operations in a given Area of Responsibility (AOR).
This article explores whether dynamically reassigning servers to parallel queues in response to queue imbalances can reduce average waiting time in those queues. approximate dynamic programming methods are used to det...
详细信息
This article explores whether dynamically reassigning servers to parallel queues in response to queue imbalances can reduce average waiting time in those queues. approximate dynamic programming methods are used to determine when servers should be switched, and the performance of such dynamic allocations is compared to that of a pre-scheduled deterministic allocation. The proposed method is tested on both synthetic data and data from airport security checkpoints at Boston Logan International Airport. It is found that in situations where the uncertainty in customer arrival rates is significant, dynamically reallocating servers can substantially reduce waiting time. Moreover, it is found that intuitive switching strategies that are optimal for queues with homogeneous entry rates are not optimal in this setting.
This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected perfor...
详细信息
This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected performance and downside risk represented by (CVαR α ). The modified stage-wise utility function is used within dynamicprogramming to generate a set of policies representing different levels of the trade-off. The approach is demonstrated in a shortest path optimal control problem and a project management problem modeled as constrained MDP. To address a more complex management problem, we utilize the Real Time approximate dynamic programming algorithm.
暂无评论