In this paper, a novel discrete-time iterative zero-sum adaptive dynamicprogramming(ADP) algorithm is developed for solving the optimal control problems of nonlinear systems. Two iteration processes, which are lower ...
详细信息
ISBN:
(纸本)9781509054626
In this paper, a novel discrete-time iterative zero-sum adaptive dynamicprogramming(ADP) algorithm is developed for solving the optimal control problems of nonlinear systems. Two iteration processes, which are lower and upper iterations, are employed to solve the lower and upper value functions, respectively. Arbitrary positive semi-definite functions are acceptable to initialize the upper and lower iterations of the iterative zero-sum ADP algorithm. It is proven that the upper and lower value functions converge to the optimal performance index function if the optimal performance index function exists, where the existence criterion of the optimal performance index function is unnecessary. Simulation examples are given to illustrate the effective performance of the present method.
A neural-network-based adaptive critic control method is established for continuous-time input-affine uncertain nonlinear systems to achieve disturbance *** present problem can be formulated as a two-player zero-sum d...
详细信息
ISBN:
(纸本)9781509046584
A neural-network-based adaptive critic control method is established for continuous-time input-affine uncertain nonlinear systems to achieve disturbance *** present problem can be formulated as a two-player zero-sum differential game and the adaptive critic mechanism is employed to solve the minimax optimization problem.A neural network identifier is developed to reconstruct the unknown dynamical *** optimal control law and the worst-case disturbance law are designed by introducing and training a critic neural *** effectiveness of the present self-learning control method is also illustrated by a simulation experiment.
Adaptive dynamicprogramming is a hot research topic nowadays. Therefore, the paper concerns a new local policy adaptive iterative dynamicprogramming (ADP) algorithm. Moreover, this algorithm is designed for the disc...
详细信息
ISBN:
(纸本)9783319590813;9783319590806
Adaptive dynamicprogramming is a hot research topic nowadays. Therefore, the paper concerns a new local policy adaptive iterative dynamicprogramming (ADP) algorithm. Moreover, this algorithm is designed for the discrete-time nonlinear systems, which are used to solve problems concerning infinite horizon optimal control. The new local policy iteration ADP algorithm has the characteristics of updating the iterative control law and value function within one subset of the state space. Morevover, detailed iteration process of the local policy iteration is presented thereafter. The simulation example is listed to show the good performance of the newly developed algorithm.
With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. approximate dynam...
详细信息
ISBN:
(纸本)9781538611074
With the development of marine science, aeronautics and astronautics, energy, chemical industry, biomedicine and management science, many complex systems face the problem of optimization and control. approximate dynamic programming solves the curse of dimensionality problem of dynamicprogramming, and it is a new kind of approximate optimization solution that emerges in recent years. Based on the analysis of optimization system, this paper proposes a nonlinear multi-input multi-output, online learning, and data-driven approximate dynamic programming structure and its learning algorithm. The method is achieved from the following three aspects: 1) the critic function of multi-dimensional input critic module of the approximate dynamic programming is approximated with a data-driven k-nearest neighbor method;2) the multi-output policy iteration of the approximate dynamic programming actor module is calculated with an exponential convergence performance;3) The critic and actor modules are learned synchronously, and achieve the online optimal and control effect. The optimal control for the longitudinal motion of a thermal underwater glider is used to show the effect of the proposed method. This work can lay a foundation for the theory and application of a nonlinear data-driven multi-input multi-output approximate dynamic programming method. It's also the consensus needs in optimization control and artificial intelligence of many scientific and engineering fields, such as energy conservation, emission reduction, decision support and operational management etc.
ADP is an effective optimal method. However, the optimality depends on its network structure and training algorithm. This paper adopts RBF neural network to realize its critic and action networks after a detailed anal...
详细信息
ISBN:
(纸本)9781509054626
ADP is an effective optimal method. However, the optimality depends on its network structure and training algorithm. This paper adopts RBF neural network to realize its critic and action networks after a detailed analysis on ADP. The LSM method is introduced as training algorithm, and a novel basis function is defined, which achieves global optimization and online control. The validity is verified by finding the optimal point through local minimums.
In this paper, a problem of active fault diagnosis for jump Markov nonlinear systems with non-Gaussian noises is considered. The imperfect state information formulation is transformed using sufficient statistics to a ...
详细信息
In this paper, a problem of active fault diagnosis for jump Markov nonlinear systems with non-Gaussian noises is considered. The imperfect state information formulation is transformed using sufficient statistics to a dynamical optimization problem that can be solved using approximate dynamic programming. The sufficient statistics are produced using the Bayesian recursive relations and particle filter algorithm. A special structure of approximate Bellman function is chosen to reduce a complexity caused by high dimension of statistics obtained from the particle filter. The proposed active fault detector design is compared with an extended Kalman filter based design in the simulation example. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm ...
详细信息
ISBN:
(纸本)9781538627266
This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.
In this paper relations between model predictive control and reinforcement learning are studied for discrete-time linear time-invariant systems with state and input constraints and a quadratic value function. The prin...
详细信息
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI)...
详细信息
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI) and operations research (OR) research communities, ADP has recently seen significant activities as an effective method for solving Markov decision processes (MDPs), which represent a type of multi-stage decision problems under uncertainty. Process control problems are similar to MDPs with the key difference being the continuous state and action spaces as opposed to discrete ones. In addition, unlike in other popular ADP application areas like robotics or games, in process control applications first and foremost concern should be on the safety and economics of the on-going operation rather than on efficient learning. We explore different options within ADP design, such as the pre-decision state vs. post-decision state value function, parametric vs. nonparametric value function approximator, batch-mode vs. continuous-mode learning, and exploration vs. robustness. We argue that ADP possesses great potentials, especially for obtaining effective control policies for stochastic constrained nonlinear or linear systems and continually improving them towards optimality. (C) 2010 Elsevier Ltd. All rights reserved.
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI)...
详细信息
We assess the potentials of the approximate dynamic programming (ADP) approach for process control, especially as a method to complement the model predictive control (MPC) approach. In the artificial intelligence (AI) and operations research (OR) research communities, ADP has recently seen significant activities as an effective method for solving Markov decision processes (MDPs), which represent a type of multi-stage decision problems under uncertainty. Process control problems are similar to MDPs with the key difference being the continuous state and action spaces as opposed to discrete ones. In addition, unlike in other popular ADP application areas like robotics or games, in process control applications first and foremost concern should be on the safety and economics of the on-going operation rather than on efficient learning. We explore different options within ADP design, such as the pre-decision state vs. post-decision state value function, parametric vs. nonparametric value function approximator, batch-mode vs. continuous-mode learning, and exploration vs. robustness. We argue that ADP possesses great potentials, especially for obtaining effective control policies for stochastic constrained nonlinear or linear systems and continually improving them towards optimality. (C) 2010 Elsevier Ltd. All rights reserved.
暂无评论