In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. When the iterative control law and ite...
详细信息
In this paper, a novel iterative adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. When the iterative control law and iterative performance index function in each iteration cannot be accurately obtained, it is shown that the iterative controls can make the performance index function converge to within a finite error bound of the optimal performance index function. Stability properties are presented to show that the system can be stabilized under the iterative control law which makes the present iterative ADP algorithm feasible for implementation both on-line and off-line. Neural networks are used to approximate the iterative performance index function and compute the iterative control policy, respectively, to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
This brief presents the real-time dynamic Dubins-Helix (RDDH) method for trajectory smoothing, which consists of Dubins-Helix trajectory generation and pitch angle smoothing. The generated 3-D trajectory is called the...
详细信息
This brief presents the real-time dynamic Dubins-Helix (RDDH) method for trajectory smoothing, which consists of Dubins-Helix trajectory generation and pitch angle smoothing. The generated 3-D trajectory is called the RDDH trajectory. On one hand, the projection of 3-D trajectory on the horizontal plane is partially generated by Dubins path planner such that the curvature radius constraint is satisfied. On the other hand, the Helix curve is constructed to satisfy the pitch angle constraint, even in the case that the initial and final poses are close. Furthermore, by analyzing the relationship between the parameters and the effectiveness of the RDDH trajectory, the smoothing algorithm is designed to obtain appropriate parameters for a shorter and smoother trajectory. In the end, the numerical results show the proposed method can generate an effective trajectory under diverse initial conditions and achieve real-time computation.
This paper proposes a bio-inspired robot with undulatory fins and summarizes its control methods. First, three basic motions, forward/backward swimming, diving/rising motion, and turning, are implemented and evaluated...
详细信息
This paper proposes a bio-inspired robot with undulatory fins and summarizes its control methods. First, three basic motions, forward/backward swimming, diving/rising motion, and turning, are implemented and evaluated by experiments. Next, a hybrid control that combines active disturbance rejection control with a fuzzy strategy is presented to achieve closed-loop depth and course control according to the evaluation of the three basic motions. Finally, waypoint tracking with a line-of-sight guidance system based on a finite-state machine for this bio-inspired robot is presented. The results of swimming experiments are provided to illustrate the validity of the proposed methods.
作者:
Wang, Fei-YueChinese Acad Sci
State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China
Welcome to the new issue of the IEEE Transactions on Computational Social systems (TCSS). I am grateful to report that, as of April 9, 2020,the Citescore of TCSS has reached to 5.26, a new high. Many thanks to all of ...
Welcome to the new issue of the IEEE Transactions on Computational Social systems (TCSS). I am grateful to report that, as of April 9, 2020,the Citescore of TCSS has reached to 5.26, a new high. Many thanks to all of you for your great effort and support.
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this stud...
详细信息
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this study, these issues are addressed from the perspective of probability. It models the action value function as the latent variable of Gaussian process, while the reward as the observed variable. Then an online approach is proposed to update the action value function by Bayesian inference. Taking an advantage of the proposed framework, a prior knowledge can be incorporated into the action value function, and thus an efficient exploration strategy is presented. At last, the Bayesian-state-action-reward-state-action algorithm is tested on some benchmark problems and empirical results show its effectiveness.
A novel supervised Actor-Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forw...
详细信息
A novel supervised Actor-Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forward neural networks respectively. The output of Actor and the state are input to Critic to approximate the performance index function. A Lyapunov stability analysis approach has been presented to prove the uniformly ultimate bounded property of the estimation errors of the neural networks. Moreover, we use the supervisory controller to pre-train Actor to achieve a basic control policy, which can improve the training convergence and success rate. We apply this method to learn an approximate optimal control policy for the ACC problem. Experimental results in several driving scenarios demonstrate that the SAC algorithm performs well, so it is feasible and effective for the ACC problem.
In this paper, we establish a neural-network-based decentralized control law to stabilize a class of continuous-time nonlinear interconnected large-scale systems using an online model-free integral policy iteration (P...
详细信息
In this paper, we establish a neural-network-based decentralized control law to stabilize a class of continuous-time nonlinear interconnected large-scale systems using an online model-free integral policy iteration (PI) algorithm. The model-free PI approach can solve the decentralized control problem for the interconnected system which has unknown dynamics. The stabilizing decentralized control law is derived based on the optimal control policies of the isolated subsystems. The online model-free integral PI algorithm is developed to solve the optimal control problems for the isolated subsystems with unknown system dynamics. We use the actor-critic technique based on the neural network and the least squares implementation method to obtain the optimal control policies. Two simulation examples are given to verify the applicability of the decentralized control law. (C) 2015 Elsevier B.V. All rights reserved.
作者:
Li, H.Liu, D.Chinese Acad Sci
State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and...
详细信息
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and control spaces. First, the selection of initial value function is different from the traditional VI, and a new method is introduced to demonstrate the convergence property and convergence speed of value function. Then, the control law obtained at each iteration can stabilise the system under some conditions. At last, an error-bound-based condition is derived considering the approximation errors of neural networks, and then the error between the optimal and approximated value functions can also be estimated. To facilitate the implementation of the iterative scheme, three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown system, the value function and the control law. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.
Residential energy scheduling is a hot topic nowadays in the background of energy saving and environmental protection worldwide. To achieve this objective, a new residential energy scheduling algorithm is developed fo...
详细信息
Residential energy scheduling is a hot topic nowadays in the background of energy saving and environmental protection worldwide. To achieve this objective, a new residential energy scheduling algorithm is developed for energy management, based on action dependent heuristic dynamic programming. The algorithm works under the circumstance of residential real-time pricing and two adjacent housing units with energy inter-exchange, which can reduce the overall cost and enhance renewable energy efficiency after long-term operation. It is designed to obtain the optimal control policy to manage the directions and amounts of electricity energy flux. The algorithm's architecture is mainly constructed based on neural networks, denoting the learned characteristics in the linkage of layers. To get close to real situations, many constraints such as maximum charging/discharging power of batteries are taken into account. The absent energy penalty cost is developed for the first time as a part of the performance index function. When the environment changes, the residential energy scheduling algorithm gains new features and keeps adapting in real-time operations. Simulation results show that the developed algorithm is beneficial to energy conversation. (C) 2015 Elsevier Ltd. All rights reserved.
This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterativ...
详细信息
This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
暂无评论