dynamicprogramming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small d...
详细信息
dynamicprogramming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem. (C) 2010 Elsevier Ltd. All rights reserved.
Spacecraft on-board autonomy is an important topic in currently developed and future space missions. In this study, we present a robust approach to the optimal policy of autonomous space systems modeled via Markov Dec...
详细信息
ISBN:
(纸本)9781509042340
Spacecraft on-board autonomy is an important topic in currently developed and future space missions. In this study, we present a robust approach to the optimal policy of autonomous space systems modeled via Markov Decision Process (MDP) from the values assigned to its transition probability matrix. After addressing the curse of dimensionality in solving the formulated MDP problem via approximate dynamic programming, we use an Apriori-based Association Classifier to infer a specific optimal policy. Finally, we also assess the effectiveness of such optimal policy in fulfilling the spacecraft autonomy requirements.
In this paper relations between model predictive control and reinforcement learning are studied for discrete-time linear time-invariant systems with state and input constraints and a quadratic value function. The prin...
详细信息
In this paper relations between model predictive control and reinforcement learning are studied for discrete-time linear time-invariant systems with state and input constraints and a quadratic value function. The principles of model predictive control and reinforcement learning are reviewed in a tutorial manner. From model predictive control theory it is inferred that the optimal value function is piecewise quadratic on polyhedra and that the optimal policy is piecewise affine on polyhedra. Various ideas for exploiting the knowledge on the structure and the properties of the optimal value function and the optimal policy in reinforcement learning theory and practice are presented. The ideas can be used for deriving stability and feasibility criteria and for accelerating the learning process which can facilitate reinforcement learning for systems with high order, fast dynamics, and strict safety requirements. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
In this paper, Four action-dependent heuristic dynamicprogramming control methods are presented for nonlinear multi-input-multi-output system with different characters based on the topology principle. These four meth...
详细信息
ISBN:
(纸本)9781538611074
In this paper, Four action-dependent heuristic dynamicprogramming control methods are presented for nonlinear multi-input-multi-output system with different characters based on the topology principle. These four methods are the action-network extension method, the sub-network method, the cascaded action-network method and the combined method. The derivation procedure and computing formulas of these methods are also derived. In it, the action-network extension method is mainly used for the conditions where the multi-output variables have the same orders of magnitude and a naturally coupled relationship. The sub-network method can nearly be applied in all cases and can solve the problem that the multi-output variables have different orders of magnitude. The cascaded action-network method is utilized when the multiple input variables have explicit cascaded relationships. The combined method can be used to control some highly regarded systems. Thus, these four methods can almost be used to satisfy all the design requirements of the nonlinear multi-input-multi-output control systems. The latter can refer to and select these methods as well as formulas for their control systems according to the research results to achieve a better control effect.
In this paper, a problem of active fault diagnosis for jump Markov nonlinear systems with non-Gaussian noises is considered. The imperfect state information formulation is transformed using sufficient statistics to a ...
详细信息
In this paper, a novel adaptive learning technique is proposed to solve a stochastic zero-sum Nash game with partially unknown nonlinear systems for which the lengths of time intervals that the system spends in each m...
详细信息
ISBN:
(纸本)9781509028733
In this paper, a novel adaptive learning technique is proposed to solve a stochastic zero-sum Nash game with partially unknown nonlinear systems for which the lengths of time intervals that the system spends in each mode are independent random variables with exponential distributions, i.e. the environment and the cost matrices depend on the outcome of a Markov chain. We first formulate the problem by using an optimal stopping process and then provide a verification theorem for stopping zero-sum games. A structure of 2 actors and 1 critic approximators are used to approximate the saddle-point policies and the optimal cost respectively. Effective tuning laws are proposed to solve the stochastic Nash game problem while also guaranteeing closed-loop stability with the use of rigorous Lyapunov-based stability proofs. Finally, a numerical example is used to illustrate the effectiveness of the proposed approach.
In this paper, a novel discrete-time iterative zero-sum adaptive dynamicprogramming(ADP) algorithm is developed for solving the optimal control problems of nonlinear systems. Two iteration processes, which are lower ...
详细信息
ISBN:
(纸本)9781509054626
In this paper, a novel discrete-time iterative zero-sum adaptive dynamicprogramming(ADP) algorithm is developed for solving the optimal control problems of nonlinear systems. Two iteration processes, which are lower and upper iterations, are employed to solve the lower and upper value functions, respectively. Arbitrary positive semi-definite functions are acceptable to initialize the upper and lower iterations of the iterative zero-sum ADP algorithm. It is proven that the upper and lower value functions converge to the optimal performance index function if the optimal performance index function exists, where the existence criterion of the optimal performance index function is unnecessary. Simulation examples are given to illustrate the effective performance of the present method.
A neural-network-based adaptive critic control method is established for continuous-time input-affine uncertain nonlinear systems to achieve disturbance *** present problem can be formulated as a two-player zero-sum d...
详细信息
ISBN:
(纸本)9781509046584
A neural-network-based adaptive critic control method is established for continuous-time input-affine uncertain nonlinear systems to achieve disturbance *** present problem can be formulated as a two-player zero-sum differential game and the adaptive critic mechanism is employed to solve the minimax optimization problem.A neural network identifier is developed to reconstruct the unknown dynamical *** optimal control law and the worst-case disturbance law are designed by introducing and training a critic neural *** effectiveness of the present self-learning control method is also illustrated by a simulation experiment.
Adaptive dynamicprogramming is a hot research topic nowadays. Therefore, the paper concerns a new local policy adaptive iterative dynamicprogramming (ADP) algorithm. Moreover, this algorithm is designed for the disc...
详细信息
ISBN:
(纸本)9783319590813;9783319590806
Adaptive dynamicprogramming is a hot research topic nowadays. Therefore, the paper concerns a new local policy adaptive iterative dynamicprogramming (ADP) algorithm. Moreover, this algorithm is designed for the discrete-time nonlinear systems, which are used to solve problems concerning infinite horizon optimal control. The new local policy iteration ADP algorithm has the characteristics of updating the iterative control law and value function within one subset of the state space. Morevover, detailed iteration process of the local policy iteration is presented thereafter. The simulation example is listed to show the good performance of the newly developed algorithm.
With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. approximate dynam...
详细信息
ISBN:
(纸本)9781538611074
With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. approximate dynamic programming solves the curse of dimensionality problem of dynamicprogramming, and it is a new kind of approximate optimization solution that emerges in recent years. Based on the analysis of optimization system, this paper proposes a nonlinear multi-input multi-output, online learning, and data-driven approximate dynamic programming structure and its learning algorithm. The method is achieved from the following three aspects: 1) the critic function of multi-dimensional input critic module of the approximate dynamic programming is approximated with a data-driven k-nearest neighbor method;2) the multi-output policy iteration of the approximate dynamic programming actor module is calculated with an exponential convergence performance;3) The critic and actor modules are learned synchronously, and achieve the online optimal and control effect. The optimal control for the longitudinal motion of a thermal underwater glider is used to show the effect of the proposed method. This work can lay a foundation for the theory and application of a nonlinear data-driven multi-input multi-output approximate dynamic programming method. It's also the consensus needs in optimization control and artificial intelligence of many scientific and engineering fields, such as energy conservation, emission reduction, decision support and operational management etc.
暂无评论