adaptive dynamic programming (ADP), an important branch of reinforcement learning, is a powerful tool in solving various optimal control problems. However, the cooperative game issues of discrete-time multiplayer syst...
详细信息
adaptive dynamic programming (ADP), an important branch of reinforcement learning, is a powerful tool in solving various optimal control problems. However, the cooperative game issues of discrete-time multiplayer systems with control constraints have rarely been investigated in this field. In order to address this issue, a novel policy iteration (PI) algorithm is proposed based on ADP technique, and its associated convergence analysis is also studied in this brief paper. For the proposed PI algorithm, an online neural network (NN) implementation scheme with multiple-network structure is presented. In the online NN-based learning algorithm, critic network, constrained actor networks and unconstrained actor networks are employed to approximate the value function, constrained and unconstrained control policies, respectively, and the NN weight updating laws are designed based on the gradient descent method. Finally, a numerical simulation example is illustrated to show the effectiveness. (C) 2019 Elsevier B.V. All rights reserved.
Considering the leader-following consensus problem for the nonlinear multi-agent systems with bounded input-disturbances under fixed topology,a novel distributed robust protocol is designed to guarantee all followers ...
详细信息
Considering the leader-following consensus problem for the nonlinear multi-agent systems with bounded input-disturbances under fixed topology,a novel distributed robust protocol is designed to guarantee all followers synchronize to the leader by investigating the gain of the Nash *** robustness restrictions are given through Lyapunov *** get the Nash solution,critic neural networks are trained based on adaptive dynamic programming algorithm in an online and forward-in-time manner to solve the coupled Hamilton-Jacobi *** additional term is added to the neural network weight tuning law to avoid the requirement for the initial admissible control law.
In this paper, a stable value iteration (SVI) algorithm is developed to solve discrete-time two-player zero-sum game (TP-ZSG) for nonlinear systems based on adaptive dynamic programming (ADP). In the SVI algorithm, bo...
详细信息
In this paper, a stable value iteration (SVI) algorithm is developed to solve discrete-time two-player zero-sum game (TP-ZSG) for nonlinear systems based on adaptive dynamic programming (ADP). In the SVI algorithm, both optimality and stability of nonlinear systems are considered with proofs given. First, an iterative ADP algorithm is presented to obtain the approximate optimal solutions by solving Hamilton-Jacobi-Isaacs (HJI) equation. Second, a range of the discount factor is shown, which guarantees HJI equation serving as a Lyapunov equation. Moreover, we prove that if the iteration number reaches a given number, then the iterative control inputs make the closed-loop system asymptotic stable. Third, in order to improve the practicability of the developed stability condition, a simple criteria is established based on Lyapunov stability theory. Neural networks (NNs) are used to approximate the system states, the value function, the control and disturbance inputs. Finally, simulation results are given to illustrate the performance of the developed optimal control method. (C) 2019 Elsevier B.V. All rights reserved.
This paper studies the connected cruise control problem for a platoon of human-operated and autonomous vehicles. The autonomous vehicles can receive motional data, ie, headway and velocity information from other vehic...
详细信息
This paper studies the connected cruise control problem for a platoon of human-operated and autonomous vehicles. The autonomous vehicles can receive motional data, ie, headway and velocity information from other vehicles by wireless vehicle-to-vehicle communication. The use of wireless communications in information exchange between vehicles inevitably causes input delay in the platooning system. Meanwhile, unpredictable behaviors of the leading vehicle constitute exogenous disturbance for the system. An adaptive optimal control problem with input delay and disturbance is formulated, and a novel data-driven control solution is proposed such that each vehicle in the platoon can achieve safe distance and desired velocity. By adopting an adaptive dynamic programming technique with sampled-data system theory, a data-driven adaptive optimal control approach is proposed for autonomous vehicles by the learning strategies of policy iteration without the accurate knowledge of the dynamics of all human drivers and vehicles. The efficacy of the proposed controller is substantiated by rigorous analysis and validated by simulation results in different scenarios.
Ascent trajectory tracking in the longitudinal plane is a class of nonaffine noncascade nonlinear system control problems with single input and multiple output, which is difficult to control by nonlinear method direct...
详细信息
Ascent trajectory tracking in the longitudinal plane is a class of nonaffine noncascade nonlinear system control problems with single input and multiple output, which is difficult to control by nonlinear method directly. adaptive dynamic programming algorithm has the advantages of precise control and adaptability for general nonlinear control problem, and the time-varying quadratic adaptive dynamic programming algorithm proposed in this paper promotes the convergence and calculation speed of the traditional adaptive dynamic programming algorithm. To implement the algorithm effectively, the independent variables of ascent model are substituted in the launching coordinate, and then the model is treated as a discrete nominal trajectory tracking problem. Besides, the heuristic dynamicprogramming structure is used to train the processed model, and thus only the time-varying weight in the designed evaluation network needs to be updated. Simulation shows that the proposed algorithm can update the control variable online after predicting the cost function offline with the dynamic equations, which is faster than the general adaptive dynamic programming algorithm. In addition, the proposed algorithm can effectively and accurately track the nominal trajectory under the uncertainty of parameters compared with the linear quadratic regulators algorithm.
In this paper,we propose a novel noncausal control framework to address the energy maximization problem of wave energy converters(WECs)subject to *** energy maximization problem of WECs is a constrained optimal contro...
详细信息
In this paper,we propose a novel noncausal control framework to address the energy maximization problem of wave energy converters(WECs)subject to *** energy maximization problem of WECs is a constrained optimal control *** proposed control framework converts this problem into a reference trajectory tracking problem through the Fourier pseudo-spectral method(FPSM)and utilizes the online tracking adaptive dynamic programming(OTADP)algorithm to realize real-time trajectory tracking for practical use in the ocean *** the wave prediction technique,the optimal trajectory is generated online through a receding horizon(RH)implementation.A critic neural network(NN)is applied to approximate the optimal cost value function and calculate the error-tracking control by solving the associated Hamilton-Jacobi-Bellman(HJB)*** proposed WEC control framework improves computational efficiency and makes the online control feasible in *** results show the effects of the receding horizon implementation of FPSM with different window lengths and window functions,while verifying the performances of tracking control and energy absorption of WECs in two different sea conditions.
This paper is concerned with the design of distributed optimal coordination control for nonlinear multi agent systems (NMASs) based on event-triggered adaptive dynamic programming (ETADP) method. The method is firstly...
详细信息
This paper is concerned with the design of distributed optimal coordination control for nonlinear multi agent systems (NMASs) based on event-triggered adaptive dynamic programming (ETADP) method. The method is firstly introduced to design the distributed coordination controllers for NMASs, which not only avoids the transmission of redundant data compared with traditional time-triggered adaptive dynamic programming (TTADP) strategy and minimizes the performance function of each agent. The event-triggered conditions are proposed based on Lyapunov functional method, which is deduced by guaranteeing the stability of NMASs. Then a new adaptive policy iteration algorithm is presented to obtain the online solutions of the Hamiton-Jocabi-Bellman (HJB) equations. In order to implement the proposed ETADP method, the fuzzy hyperbolic model based critic neural networks (NN) are utilized to approximate the value functions and help calculate the control policies. In critic NNs, the NN weight estimations are updated at the event-triggered instants leading to aperiodic weight tuning laws so that computation cost is reduced. It is proved that the weight estimation errors and the local neighborhood coordination errors is uniformly ultimately bounded (UUB). Finally, two simulation examples are provided to show the effectiveness of the proposed ETADP method. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.
This study explores a new robust consensus control strategy for uncertain multiagent systems and provides an event-based solution to adaptive dynamic programming(ADP)based optimal *** than the control function,the fee...
详细信息
This study explores a new robust consensus control strategy for uncertain multiagent systems and provides an event-based solution to adaptive dynamic programming(ADP)based optimal *** than the control function,the feedback system established symmetrical to the physical system allows the optimal consensus control issue to be handled by the optimal control protocol of an augmented affine *** feedback system focuses on an auxiliary variable formed in light of the optimality principle and the virtual control input built on a critic neural network(NN).Analysis reveals that the auxiliary variable benefits from decreasing the influence of uncertainty on control performance,while the proposed approach is implemented with fewer communication resources since the critic NN is updated as events ***,evidence from simulation findings validates the theoretical results.
This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measure...
详细信息
This paper concerns with a class of discrete-time linear nonzero-sum games with the partially observable system state. As is known, the optimal control policy for the nonzero-sum games relies on the full state measurement which is hard to fulfil in partially observable environment. Moreover, to achieve the optimal control, one needs to know the accurate system model. To overcome these deficiencies, this paper develops a data-driven adaptive dynamic programming method via Q-learning method using measurable input/output data without any system knowledge. First, the representation of the unmeasurable inner system state is built using historical input/output data. Then, based on the representation state, a Q-function-based policy iteration approach with convergence analysis is introduced to approximate the optimal control policy iteratively. A neural network (NN)-based actor-critic framework is applied to implement the developed data-driven approach. Finally, two simulation examples are provided to demonstrate the effectiveness of the developed approach.
In this study, a new approach based on adaptive dynamic programming (ADP) is proposed to control single-phase uninterruptible power supply inverters. The control scheme uses a single function approximator, called crit...
详细信息
In this study, a new approach based on adaptive dynamic programming (ADP) is proposed to control single-phase uninterruptible power supply inverters. The control scheme uses a single function approximator, called critic, to evaluate the optimal cost and determine the optimal switching. After offline training of the critic, which is a function of system states and elapsed time, the resulting optimal weights are used in online control, to get a smooth output AC voltage in a feedback form. Simulations show the desirable performance of this controller with linear and non-linear loads and its relative robustness to parameter uncertainty and disturbances. Furthermore, the proposed controller is upgraded so that the inverter is suitable for single-phase variable frequency drives. Finally, as one of the few studies in the field of ADP, the proposed controllers are implemented on a physical prototype to show the performance in practise.
暂无评论