Motion controllers capable of incremental learning and optimization can automatically tune their parameters to pursue optimal control. By implementing reinforcement learning and approximate dynamic programming, an ada...
详细信息
ISBN:
(纸本)9781889335384
Motion controllers capable of incremental learning and optimization can automatically tune their parameters to pursue optimal control. By implementing reinforcement learning and approximate dynamic programming, an adaptive critic motion controller is shown able to achieve this objective. The control policy and the adaptive critic are implemented by sparse radial basis function networks. The policy and the critic updating rules are derived. Ability and performance of the adaptive critic motion controller is demonstrated by the control of a rotary inverted pendulum system.
This paper reviews dynamicprogramming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-dynamicprogramming (NDP),...
详细信息
This paper reviews dynamicprogramming (DP), surveys approximate solution methods for it, and considers their applicability to process control problems. Reinforcement Learning (RL) and Neuro-dynamicprogramming (NDP), which can be viewed as approximate DP techniques, are already established techniques for solving difficult multi-stage decision problems in the fields of operations research, computer science, and robotics. Owing to the significant disparity of problem formulations and objective, however, the algorithms and techniques available from these fields are not directly applicable to process control problems, and reformulations based on accurate understanding of these techniques are needed. We categorize the currently available approximate solution techniques for dynamicprogramming and identify those most suitable for process control problems. Several open issues are also identified and discussed.
A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constra...
详细信息
A novel adaptive-critic-based neural network (NN) controller in discrete time is designed to deliver a desired tracking performance for a class of nonlinear systems in the presence of actuator constraints. The constraints of the actuator are treated in the controller design as the saturation nonlinearity. The adaptive critic NN controller architecture based on state feedback includes two NNs: the critic NN is used to approximate the "strategic" utility function, whereas the action NN is employed to minimize both the strategic utility function and the unknown nonlinear dynamic estimation errors. The critic and action NN weight updates are derived by minimizing certain quadratic performance indexes. Using the Lyapunov approach and with novel weight updates, the uniformly ultimate boundedness of the closed-loop tracking error and weight estimates is shown in the presence of NN approximation errors and bounded unknown disturbances. The proposed NN controller works in the presence of multiple nonlinearities, unlike other schemes that normally approximate one nonlinearity. Moreover, the adaptive critic NN controller does not require an explicit offline training phase, and the NN weights can be initialized at zero or random. Simulation results justify the theoretical analysis.
In many resource-constrained project scheduling problems (RCPSP), the set of candidate projects is not fixed a priori but evolves with time. For example, while performing an initial set of projects according to a cert...
详细信息
In many resource-constrained project scheduling problems (RCPSP), the set of candidate projects is not fixed a priori but evolves with time. For example, while performing an initial set of projects according to a certain decision policy, a new promising project can emerge. To make an appropriate resource allocation decision for such a problem, project cancellation and resource idling decisions should complement the conventional scheduling decisions. In this study, the problem of stochastic RCPSP (sRCPSP) with dynamic project arrivals is addressed with the added flexibility of project cancellation and resource idling. To solve the problem, a Q-Learning-based approach is adopted. To use the approach, the problem is formulated as a Markov Decision Process with appropriate definitions of states, including information state and action variables. The Q-Learning approach enables us to derive an empirical state transition rules from simulation data so that analytical calculations of potentially exorbitantly complicated state transition rules can be circumvented. To maximize the advantage of using the empirically learned state transition rules, special type of actions including project cancellation and resource idling, which are difficult to incorporate into heuristics, were randomly added in the simulation. The random actions are filtered during the Q-Value iteration and properly utilized in the online decision making to maximize the total expected reward. Copyright (C) 2007 John Wiley & Sons, Ltd.
We devise an algorithm for solving the infinite-dimensional linear programs that arise from general deterministic semi-Markov decision processes on Borel spaces. The algorithm constructs a sequence of approximate prim...
详细信息
We devise an algorithm for solving the infinite-dimensional linear programs that arise from general deterministic semi-Markov decision processes on Borel spaces. The algorithm constructs a sequence of approximate primal-dual solutions that converge to an optimal one. The innovative idea is to approximate the dual solution with continuous piecewise linear ridge functions that naturally represent functions defined on a high-dimensional domain as linear combinations of functions defined on only a single dimension. This approximation gives rise to a primal/dual pair of semi-infinite programs, for which we show strong duality. In addition, we prove various properties of the underlying ridge functions.
A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are preval...
详细信息
A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.
In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices...
详细信息
In this paper, the optimal strategies for discrete-time linear system quadratic zero-sum games related to the H-infinity optimal control problem are solved in forward time without knowing the system dynamical matrices. The idea is to solve for an action dependent value function Q(x, u, w) of the zero-sum game instead of solving for the state dependent value function V(x) which satisfies a corresponding game algebraic Riccati equation (GARE). Since the state and actions spaces are continuous, two action networks and one critic network are used that are adaptively tuned in forward time using adaptive critic methods. The result is a Q-learning approximate dynamic programming (ADP) model-free approach that solves the zero-sum game forward in time. It is shown that the critic converges to the game value function and the action networks converge to the Nash equilibrium of the game. Proofs of convergence of the algorithm are shown. It is proven that the algorithm ends up to be a model-free iterative algorithm to solve the GARE of the linear quadratic discrete-time zero-sum game. The effectiveness of this method is shown by performing an H-infinity control autopilot design for an F-16 aircraft. (C) 2007 Elsevier Ltd. All rights reserved.
In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of ...
详细信息
In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating a
This paper presents a simulation-based approach for designing a non-linear override control scheme to improve the performance of a local linear controller. The higher-level non-linear controller monitors the dynamic s...
详细信息
This paper presents a simulation-based approach for designing a non-linear override control scheme to improve the performance of a local linear controller. The higher-level non-linear controller monitors the dynamic state of the system and calculates an override control action whenever the system is predicted to move outside an acceptable operating regime under the local controller. The design of the non-linear override controller is based on a cost-to-go function, which is constructed by using simulation or operation data. The cost-to-go function delineates the admissible region of state space within which the local controller is effective, thereby yielding a switching rule.
The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will see...
详细信息
ISBN:
(纸本)9781424407064
The aim of this study is to assist a military decision maker during his decision-making process when applying tactics on the battlefield. For that, we have decided to model the conflict by a game, on which we will seek to find strategies guaranteeing to achieve given goals simultaneously defined in terms of attrition and tracking. The model relies multi-valued graphs, and leads us to solve a stochastic shortest path problem. The employed techniques refer to Temporal Differences methods but also use a heuristic qualification of system states to face algorithmic complexity issues.
暂无评论