In this paper, an optimal tracking control scheme is proposed for a class of unknown discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) algorithm. First, in order to obtain the dynamics...
详细信息
In this paper, an optimal tracking control scheme is proposed for a class of unknown discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) algorithm. First, in order to obtain the dynamics of the system, an identifier is constructed by a three-layer feedforward neural network (NN). Second, a feedforward neuro-controller is designed to get the desired control input of the system. Third, via system transformation, the original tracking problem is transformed into a regulation problem with respect to the state tracking error. Then, the iterative ADP algorithm based on heuristic dynamic programming is introduced to deal with the regulation problem with convergence analysis. In this scheme, feedforward NNs are used as parametric structures for facilitating the implementation of the iterative algorithm. Finally, simulation results are also presented to demonstrate the effectiveness of the proposed scheme. (C) 2013 Elsevier B.V. All rights reserved.
In this paper, a type of fuzzy system structure is applied to heuristic dynamic programming (HDP) algorithm to solve nonlinear discrete-time Hamilton-Jacobi-Bellman (DT-HJB) problems. The fuzzy system here is adopted ...
详细信息
In this paper, a type of fuzzy system structure is applied to heuristic dynamic programming (HDP) algorithm to solve nonlinear discrete-time Hamilton-Jacobi-Bellman (DT-HJB) problems. The fuzzy system here is adopted as a 0-order T-S fuzzy system using triangle membership functions (MFs). The convergence of HDP and approximability of the multivariate 0-order T-S fuzzy system is analyzed in this paper. It is derived that the cost function and control policy of HDP can be iterated to the DT-1-1113 solution and optimal policy. The multivariate 0-order T-S (Tanaka-Sugeno) fuzzy system using triangle MFs is proven as a universal approximator, to guarantee the convergence of the Fuzzy-HDP mechanism. Some simulations are implemented to observe the performance of the proposed method both in mathematical solution and practical issue. It is concluded that Fuzzy-HDP outperforms traditional optimal control in more complexsystems. (C) 2014 Elsevier B.V. All rights reserved.
In this study, a novel online adaptive dynamic programming (ADP)-based algorithm is developed for solving the optimal control problem of affine non-linear continuous-time systems with unknown internal dynamics. The pr...
详细信息
In this study, a novel online adaptive dynamic programming (ADP)-based algorithm is developed for solving the optimal control problem of affine non-linear continuous-time systems with unknown internal dynamics. The present algorithm employs an observer-critic architecture to approximate the Hamilton-Jacobi-Bellman equation. Two neural networks (NNs) are used in this architecture: an NN state observer is constructed to estimate the unknown system dynamics and a critic NN is designed to derive the optimal control instead of typical action-critic dual networks employed in traditional ADP algorithms. Based on the developed architecture, the observer NN and the critic NN are tuned simultaneously. Meanwhile, unlike existing tuning laws for the critic, the newly developed critic update rule not only ensures convergence of the critic to the optimal control but also guarantees stability of the closed-loop system. No initial stabilising control is required, and by using recorded and instantaneous data simultaneously for the adaptation of the critic, the restrictive persistence of excitation condition is relaxed. In addition, Lyapunov direct method is utilised to demonstrate the uniform ultimate boundedness of the weights of the observer NN and the critic NN. Finally, an example is provided to verify the effectiveness of the present approach.
In order to implement model-based recognition of human motion intention, dynamics modeling and identification of a lower limb rehabilitation robot named iLeg is investigated. Due to the relatively strong motion constr...
详细信息
In order to implement model-based recognition of human motion intention, dynamics modeling and identification of a lower limb rehabilitation robot named iLeg is investigated. Due to the relatively strong motion constraints, the traditional identification methods become insufficient for iLeg in three aspects: 1) the coupling factors among joints have not been considered in the traditional joint friction models, which makes the structural error and the torque estimation errors relatively large;2) because of the small and complicated feasible region caused by the motion constraints, the traditional initialization strategy, for searching the valid initial solutions of the optimization problem for the exciting trajectories, becomes very inefficient;and 3) the condition number of the observation matrix, calculated from the preliminary dynamic model and the associated optimized exciting trajectory, is too large for the identification, and, however, further reduction of the condition number has not been considered in the literature. Therefore, corresponding contributions are presented to overcome the limitation. First, the coupling factors among joints are considered in the joint friction model by using the Palmgren empirical formulation and a polynomial fitting method. Then, an indirectly generating strategy is designed, by which the valid initial solutions of the optimization problem can be found with good efficiency. Moreover, a recursive optimization method based on the optimization of the dynamic model and the exciting trajectories, is proposed to further reduce the condition number. Finally, the performance of the proposed methods is demonstrated by several experiments.
In this paper, the adaptive dynamic programming (ADP) approach is utilized to design a neural-network-based optimal controller for a class of unknown discrete-time nonlinear systems with quadratic cost function. To be...
详细信息
In this paper, the adaptive dynamic programming (ADP) approach is utilized to design a neural-network-based optimal controller for a class of unknown discrete-time nonlinear systems with quadratic cost function. To begin with, a neural network identifier is constructed to learn the unknown dynamic system with stability proof. Then, the iterative ADP algorithm is developed to handle the nonlinear optimal control problem with convergence analysis. Moreover, the single network dual heuristic dynamic programming (SN-DHP) technique, which eliminates the use of action network, is introduced to implement the iterative ADP algorithm. Finally, two simulation examples are included to illustrate the effectiveness of the present approach. (C) 2013 Elsevier B.V. All rights reserved.
This paper develops an online algorithm based on policy iteration for optimal control with infinite horizon cost for continuous-time nonlinear systems. In the present method, a discounted value function is employed, w...
详细信息
This paper develops an online algorithm based on policy iteration for optimal control with infinite horizon cost for continuous-time nonlinear systems. In the present method, a discounted value function is employed, which is considered to be a more general case for optimal control problems. Meanwhile, without knowledge of the internal system dynamics, the algorithm can converge uniformly online to the optimal control, which is the solution of the modified Hamilton-Jacobi-Bellman equation. By means of two neural networks, the algorithm is able to find suitable approximations of both the optimal control and the optimal cost. The uniform convergence to the optimal control is shown, guaranteeing the stability of the nonlinear system. A simulation example is provided to illustrate the effectiveness and applicability of the present approach.
In this study, a novel numerical adaptive learning control scheme based on adaptive dynamic programming (ADP) algorithm is developed to solve numerical optimal control problems for infinite horizon discrete-time non-l...
详细信息
In this study, a novel numerical adaptive learning control scheme based on adaptive dynamic programming (ADP) algorithm is developed to solve numerical optimal control problems for infinite horizon discrete-time non-linear systems. Using the numerical controller, the domain of definition is constrained to a discrete set that makes the approximation errors always exist between the numerical controls and the accurate ones. Convergence analysis of the numerical iterative ADP algorithm is developed to show that the numerical iterative controls can make the iterative performance index functions converge to the greatest lower bound of all performance indices within a finite error bound under some mild assumptions. The stability properties of the system under the numerical iterative controls are proved, which allow the present iterative ADP algorithm to be implemented both on-line and off-line. Finally, two simulation examples are given to illustrate the performance of the present method.
This paper addresses the novel design of an underwater manipulator with a lightweight multilink structure and its free-floating autonomous operation. The concept design reduces the coupling between the manipulator and...
详细信息
This paper addresses the novel design of an underwater manipulator with a lightweight multilink structure and its free-floating autonomous operation. The concept design reduces the coupling between the manipulator and the vehicle efficiently, even in the case where the vehicle weight in air is not significantly greater than the manipulator weight. The specific implementation of the mechanical structure is elaborated. Moreover, a closed-loop control system based on binocular vision is proposed for underwater manipulation. In the end, experimental results demonstrate that the conceived underwater manipulator can accomplish the autonomous operation quickly.
In this paper, a self-learning control scheme is proposed for the infinite horizon optimal control of affine nonlinear systems based on the action dependent heuristic dynamic programming algorithm. The policy iteratio...
详细信息
In this paper, a self-learning control scheme is proposed for the infinite horizon optimal control of affine nonlinear systems based on the action dependent heuristic dynamic programming algorithm. The policy iteration technique is introduced to derive the optimal control policy with feasibility and convergence analysis. It shows that the "greedy" control action for each state is uniquely existent, the learned control policy after each policy iteration is admissible, and the optimal control policy is able to be obtained. Two three-layer perceptron neural networks are employed to implement the scheme. The critic network is trained by a novel rule to conform to the Bellman equation, and the action network is trained to yield a better control policy. Both training processes alternate until the optimal control policy is achieved. Two simulation examples are provided to validate the effectiveness of the approach.
In this study, an online adaptive optimal control scheme is developed for solving the infinite-horizon optimal control problem of uncertain non-linear continuous-time systems with the control policy having saturation ...
详细信息
In this study, an online adaptive optimal control scheme is developed for solving the infinite-horizon optimal control problem of uncertain non-linear continuous-time systems with the control policy having saturation constraints. A novel identifier-critic architecture is presented to approximate the Hamilton-Jacobi-Bellman equation using two neural networks (NNs): an identifier NN is used to estimate the uncertain system dynamics and a critic NN is utilised to derive the optimal control instead of typical action-critic dual networks employed in reinforcement learning. Based on the developed architecture, the identifier NN and the critic NN are tuned simultaneously. Meanwhile, unlike initial stabilising control indispensable in policy iteration, there is no special requirement imposed on the initial control. Moreover, by using Lyapunov's direct method, the weights of the identifier NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. Finally, an example is provided to demonstrate the effectiveness of the present approach.
暂无评论