In this paper, a reinforcement-learning-based direct adaptive control is developed to deliver a desired tracking performance for a class of discrete-time (DT) nonlinear systems with unknown bounded disturbances. We in...
详细信息
In this paper, a reinforcement-learning-based direct adaptive control is developed to deliver a desired tracking performance for a class of discrete-time (DT) nonlinear systems with unknown bounded disturbances. We investigate multi-input-multi-output unknown nonaffine nonlinear DT systems and employ two neural networks (NNs). By using Implicit Function Theorem, an action NN is used to generate the control signal and it is also designed to cancel the nonlinearity of unknown DT systems, for purpose of utilizing feedback linearization methods. On the other hand, a critic NN is applied to estimate the cost function, which satisfies the recursive equations derived from heuristic dynamic programming. The weights of both the action NN and the critic NN are directly updated online instead of offline training. By utilizing Lyapunov's direct method, the closed-loop tracking errors and the NN estimated weights are demonstrated to be uniformly ultimately bounded. Two numerical examples are provided to show the effectiveness of the present approach. (C) 2014 Elsevier Ltd. All rights reserved.
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. When the iterative control law and ite...
详细信息
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. When the iterative control law and iterative performance index function in each iteration cannot be accurately obtained, it is shown that the iterative controls can make the performance index function converge to within a finite error bound of the optimal performance index function. Stability properties are presented to show that the system can be stabilized under the iterative control law which makes the present iterative ADP algorithm feasible for implementation both on-line and off-line. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
This brief presents the real-time dynamic Dubins-Helix (RDDH) method for trajectory smoothing, which consists of Dubins-Helix trajectory generation and pitch angle smoothing. The generated 3-D trajectory is called the...
详细信息
This brief presents the real-time dynamic Dubins-Helix (RDDH) method for trajectory smoothing, which consists of Dubins-Helix trajectory generation and pitch angle smoothing. The generated 3-D trajectory is called the RDDH trajectory. On one hand, the projection of 3-D trajectory on the horizontal plane is partially generated by Dubins path planner such that the curvature radius constraint is satisfied. On the other hand, the Helix curve is constructed to satisfy the pitch angle constraint, even in the case that the initial and final poses are close. Furthermore, by analyzing the relationship between the parameters and the effectiveness of the RDDH trajectory, the smoothing algorithm is designed to obtain appropriate parameters for a shorter and smoother trajectory. In the end, the numerical results show the proposed method can generate an effective trajectory under diverse initial conditions and achieve real-time computation.
作者:
Wang, Fei-YueChinese Acad Sci
State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China
Welcome to the new issue of the IEEE Transactions on Computational Social systems (TCSS). I am grateful to report that, as of April 9, 2020,the Citescore of TCSS has reached to 5.26, a new high. Many thanks to all of ...
Welcome to the new issue of the IEEE Transactions on Computational Social systems (TCSS). I am grateful to report that, as of April 9, 2020,the Citescore of TCSS has reached to 5.26, a new high. Many thanks to all of you for your great effort and support.
This paper proposes a bio-inspired robot with undulatory fins and summarizes its control methods. First, three basic motions, forward/backward swimming, diving/rising motion, and turning, are implemented and evaluated...
详细信息
This paper proposes a bio-inspired robot with undulatory fins and summarizes its control methods. First, three basic motions, forward/backward swimming, diving/rising motion, and turning, are implemented and evaluated by experiments. Next, a hybrid control that combines active disturbance rejection control with a fuzzy strategy is presented to achieve closed-loop depth and course control according to the evaluation of the three basic motions. Finally, waypoint tracking with a line-of-sight guidance system based on a finite-state machine for this bio-inspired robot is presented. The results of swimming experiments are provided to illustrate the validity of the proposed methods.
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this stud...
详细信息
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this study, these issues are addressed from the perspective of probability. It models the action value function as the latent variable of Gaussian process, while the reward as the observed variable. Then an online approach is proposed to update the action value function by Bayesian inference. Taking an advantage of the proposed framework, a prior knowledge can be incorporated into the action value function, and thus an efficient exploration strategy is presented. At last, the Bayesian-state-action-reward-state-action algorithm is tested on some benchmark problems and empirical results show its effectiveness.
A novel supervised Actor-Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forw...
详细信息
A novel supervised Actor-Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forward neural networks respectively. The output of Actor and the state are input to Critic to approximate the performance index function. A Lyapunov stability analysis approach has been presented to prove the uniformly ultimate bounded property of the estimation errors of the neural networks. Moreover, we use the supervisory controller to pre-train Actor to achieve a basic control policy, which can improve the training convergence and success rate. We apply this method to learn an approximate optimal control policy for the ACC problem. Experimental results in several driving scenarios demonstrate that the SAC algorithm performs well, so it is feasible and effective for the ACC problem.
In this paper, we establish a neural-network-based decentralized control law to stabilize a class of continuous-time nonlinear interconnected large-scale systems using an online model-free integral policy iteration (P...
详细信息
In this paper, we establish a neural-network-based decentralized control law to stabilize a class of continuous-time nonlinear interconnected large-scale systems using an online model-free integral policy iteration (PI) algorithm. The model-free PI approach can solve the decentralized control problem for the interconnected system which has unknown dynamics. The stabilizing decentralized control law is derived based on the optimal control policies of the isolated subsystems. The online model-free integral PI algorithm is developed to solve the optimal control problems for the isolated subsystems with unknown system dynamics. We use the actor-critic technique based on the neural network and the least squares implementation method to obtain the optimal control policies. Two simulation examples are given to verify the applicability of the decentralized control law. (C) 2015 Elsevier B.V. All rights reserved.
Residential energy scheduling is a hot topic nowadays in the background of energy saving and environmental protection worldwide. To achieve this objective, a new residential energy scheduling algorithm is developed fo...
详细信息
Residential energy scheduling is a hot topic nowadays in the background of energy saving and environmental protection worldwide. To achieve this objective, a new residential energy scheduling algorithm is developed for energy management, based on action dependent heuristic dynamic programming. The algorithm works under the circumstance of residential real-time pricing and two adjacent housing units with energy inter-exchange, which can reduce the overall cost and enhance renewable energy efficiency after long-term operation. It is designed to obtain the optimal control policy to manage the directions and amounts of electricity energy flux. The algorithm's architecture is mainly constructed based on neural networks, denoting the learned characteristics in the linkage of layers. To get close to real situations, many constraints such as maximum charging/discharging power of batteries are taken into account. The absent energy penalty cost is developed for the first time as a part of the performance index function. When the environment changes, the residential energy scheduling algorithm gains new features and keeps adapting in real-time operations. Simulation results show that the developed algorithm is beneficial to energy conversation. (C) 2015 Elsevier Ltd. All rights reserved.
作者:
Li, H.Liu, D.Chinese Acad Sci
State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and...
详细信息
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and control spaces. First, the selection of initial value function is different from the traditional VI, and a new method is introduced to demonstrate the convergence property and convergence speed of value function. Then, the control law obtained at each iteration can stabilise the system under some conditions. At last, an error-bound-based condition is derived considering the approximation errors of neural networks, and then the error between the optimal and approximated value functions can also be estimated. To facilitate the implementation of the iterative scheme, three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown system, the value function and the control law. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.
暂无评论