In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices...
详细信息
In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x, u, w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming (ADP) model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the GARE of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft. (C) 2007 Elsevier Ltd. All rights reserved.
In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of ...
详细信息
In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating a
This paper presents a simulation-based approach for designing a non-linear override control scheme to improve the performance of a local linear controller. The higher-level non-linear controller monitors the dynamic s...
详细信息
This paper presents a simulation-based approach for designing a non-linear override control scheme to improve the performance of a local linear controller. The higher-level non-linear controller monitors the dynamic state of the system and calculates an override control action whenever the system is predicted to move outside an acceptable operating regime under the local controller. The design of the non-linear override controller is based on a cost-to-go function, which is constructed by using simulation or operation data. The cost-to-go function delineates the admissible region of state space within which the local controller is effective, thereby yielding a switching rule.
The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will see...
详细信息
ISBN:
(纸本)9781424407064
The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.
approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model...
详细信息
ISBN:
(纸本)9781424407064
approximate dynamic programming has been formulated and applied mainly to discrete-time systems. Expressing the ADP concept for continuous-time systems raises difficult issues related to sampling time and system model knowledge requirements. In this paper is presented a novel online adaptive critic (AC) scheme, based on approximate dynamic programming (ADP), to solve the infinite horizon optimal control problem for continuous-time dynamical systems;thus bringing together concepts from the fields of computational intelligence and control theory. Only partial knowledge about the system model is used, as knowledge about the plant internal dynamics is not needed. The method is thus useful to determine the optimal controller for plants with partially unknown dynamics. It is shown that the proposed iterative ADP algorithm is in fact a Quasi-Newton method to solve the underlying Algebraic Riccati Equation (ARE) of the optimal control problem. An initial gain that determines a stabilizing control policy is not required. In control theory terms, in this paper is developed a direct adaptive control algorithm for obtaining the optimal control solution without knowing the system A matrix.
In this paper, we focus on a particular version of the dynamic service network design (DSND) problem, namely the case of a single-terminal that dispatches services to a number of customers and other terminals. We pres...
详细信息
In this paper, we focus on a particular version of the dynamic service network design (DSND) problem, namely the case of a single-terminal that dispatches services to a number of customers and other terminals. We present a time-dependent, stochastic formulation that aims to optimize the problem over a given planning horizon, and propose a solution approach based on dynamicprogramming principles. We also present a static, single-period, formulation of the single-node problem that appears as a subproblem when addressing the time-dependent version and general service network design cases. Despite its apparent simplicity, it is still a network design problem and exact solution methods are not sufficiently fast. We therefore propose two tabu search meta-heuristics based on the ejection-chain concept. We also introduce a learning mechanism that takes advantage of experience gathered in repeated executions. Experiments with problem instances derived from real cases indicate that the proposed solution methods are efficient and yield good solutions. (c) 2004 Elsevier B.V. All rights reserved.
Even though dynamicprogramming offers an optimal control solution in a state feedback form, the method is overwhelmed by computational and storage requirements. approximate dynamic programming implemented with an Ada...
详细信息
Even though dynamicprogramming offers an optimal control solution in a state feedback form, the method is overwhelmed by computational and storage requirements. approximate dynamic programming implemented with an Adaptive Critic (AC) neural network structure has evolved as a powerful alternative technique that obviates the need for excessive computations and storage requirements in solving optimal control problems. In this paper, an improvement to the AC architecture, called the "Single Network Adaptive Critic (SNAC)" is presented. This approach is applicable to a wide class of nonlinear systems where the optimal control (stationary) equation can be explicitly expressed in terms of the state and costate variables. The selection of this terminology is guided by the fact that it eliminates the use of one neural network (namely the action network) that is part of a typical dual network AC setup. As a consequence, the SNAC architecture offers three potential advantages: a simpler architecture, lesser computational load and elimination of the approximation error associated with the eliminated network. In order to demonstrate these benefits and the control synthesis technique using SNAC, two problems have been solved with the AC and SNAC approaches and their computational performances are compared. One of these problems is a real-life Micro-Electro-Mechanical-system (MEMS) problem, which demonstrates that the SNAC technique is applicable to complex engineering systems. (c) 2006 Elsevier Ltd. All rights reserved.
In this paper, we present a stochastic model for the dynamic fleet management problem with random travel times. Our approach decomposes the problem into time-staged subproblems by formulating it as a dynamic program a...
详细信息
In this paper, we present a stochastic model for the dynamic fleet management problem with random travel times. Our approach decomposes the problem into time-staged subproblems by formulating it as a dynamic program and uses approximations of the value function. In order to deal with random travel times, the state variable of our dynamic program includes all individual decisions over a relevant portion of the history. We show how to approximate the value function in a tractable manner under this new high-dimensional state variable. Under our approximation scheme, the subproblem for each time period decomposes with respect to locations, making our model very appealing for large-scale applications. Numerical work shows that the proposed approach provides high-quality solutions and performs significantly better than standard benchmark methods. (c) 2005 Elsevier B.V. All rights reserved.
The increasing complexity of the modern power grid highlights the need for advanced modeling and control techniques for effective control of excitation and turbine systems. The crucial factors affecting the modern pow...
详细信息
ISBN:
(纸本)9781424404926
The increasing complexity of the modern power grid highlights the need for advanced modeling and control techniques for effective control of excitation and turbine systems. The crucial factors affecting the modern power systems today is voltage control and system stabilization during small and large disturbances. Simulation studies and real-time laboratory experimental studies carried out are described and the results show the successful control of the power system excitation and turbine systems with adaptive and optimal neurocontrol approaches. Performances of the neurocontrollers are compared with the conventional PI controllers for damping under different operating conditions for small and large disturbances.
In the present paper, a call admission control scheme that can learn from the network environment and user behavior is developed for code division multiple access (CDMA) cellular networks that handle both voice and da...
详细信息
In the present paper, a call admission control scheme that can learn from the network environment and user behavior is developed for code division multiple access (CDMA) cellular networks that handle both voice and data services. The idea is built upon a novel learning control architecture with only a single module instead of two or three modules in adaptive critic designs (ACDs). The use of adaptive critic approach for call admission control in wireless cellular networks is new. The call admission controller can perform learning in real-time as well as in offline environments and the controller improves its performance as it gains more experience. Another important contribution in the present work is the choice of utility function for the present self-learning control approach which makes the present learning process much more efficient than existing learning control methods. The performance of our algorithm will be shown through computer simulation and compared with existing algorithms.
暂无评论