An event-triggered adaptivedynamicprogramming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is con...
详细信息
An event-triggered adaptivedynamicprogramming (ADP) algorithm is developed in this article to solve the tracking control problem for partially unknown constrained uncertain systems. First, an augmented system is constructed, and the solution of the optimal tracking control problem of the uncertain system is transformed into an optimal regulation of the nominal augmented system with a discounted value function. The integral reinforcementlearning is employed to avoid the requirement of augmented drift dynamics. Second, the event-triggered ADP is adopted for its implementation, where the learning of neural network weights not only relaxes the initial admissible control but also executes only when the predefined execution rule is violated. Third, the tracking error and the weight estimation error prove to be uniformly ultimately bounded, and the existence of a lower bound for the interexecution times is analyzed. Finally, simulation results demonstrate the effectiveness of the present event-triggered ADP method.
This paper presents a novel on-policy Q-learning approach for finding the optimal control policy online for continuous-time linear time invariant (LTI) systems with completely unknown dynamics. The proposed result est...
详细信息
ISBN:
(纸本)9781479945528
This paper presents a novel on-policy Q-learning approach for finding the optimal control policy online for continuous-time linear time invariant (LTI) systems with completely unknown dynamics. The proposed result estimates the unknown parameters of the optimal control policy based on the fixed point equation involving the Q-function. The gradient-based update laws, based on the minimization of the Bellman's error, are used to achieve online adaptation of parameters with the use of persistence of excitation condition. A novel asymptotically convergent state derivative estimator is presented to ensure that the proposed result is independent of knowledge of system dynamics. Simulation results are presented to validate the theoretical development.
Since the 1960's I proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptivedynamicprogramming (ADP) - li...
详细信息
ISBN:
(纸本)9781424407064
Since the 1960's I proposed that we could understand and replicate the highest level of intelligence seen in the brain, by building ever more capable and general systems for adaptivedynamicprogramming (ADP) - like "reinforcementlearning" but based on approximating the Bellman equation and allowing the controller to know its utility function. Growing empirical evidence on the brain supports this approach. adaptive critic systems now meet tough engineering challenges and provide a kind of first-generation model of the brain. Lewis, Prokhorov and myself have early second-generation work. Mammal brains possess three core capabilities creativity/imagination and ways to manage spatial and temporal complexity - even beyond the second generation. This paper reviews previous progress, and describes new tools and approaches to overcome the spatial complexity gap.
In this paper, a novel optimal control over finite horizon has been introduced for linear continuous-time systems by using adaptivedynamicprogramming (ADP). First, a new time-varying Q-function parameterization and ...
详细信息
ISBN:
(纸本)9781479945528
In this paper, a novel optimal control over finite horizon has been introduced for linear continuous-time systems by using adaptivedynamicprogramming (ADP). First, a new time-varying Q-function parameterization and its estimator are introduced. Subsequently, Q-function estimator is tuned online by using both Bellman equation in integral form and terminal cost. Eventually, near optimal control gain is obtained by using the Q-function estimator. All the closed-loop signals are shown to be bounded by using Lyapunov stability analysis where bounds are functions of initial conditions and final time while the estimated control signal converges close to the optimal value. The simulation results illustrate the effectiveness of the proposed scheme.
Recent developments in multiagent reinforcementlearning, mostly concentrate on normal form games or restrictive hierarchical form games. In this paper, we use the well known Q-learning in extensive form games which a...
详细信息
ISBN:
(纸本)9781424427611
Recent developments in multiagent reinforcementlearning, mostly concentrate on normal form games or restrictive hierarchical form games. In this paper, we use the well known Q-learning in extensive form games which agents have a fixed priority in action selection. We also introduce a new concept called associative Q-values which not only can be used in action selection, leading to a subgame perfect equilibrium, but also can be used in update rule which is proved to be convergent. Associative Q-values are the expected utility of an agent in a game situation which is an estimate of the value of the subgame perfect equilibrium point.
We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystrom approximation. Compared to recently developed ...
详细信息
ISBN:
(纸本)9781479945528
We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystrom approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamicprogramming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks.
This paper is concerned with a new discrete-time policy iteration adaptivedynamicprogramming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterativ...
详细信息
This paper is concerned with a new discrete-time policy iteration adaptivedynamicprogramming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
In this paper, an optimal tracking control approach based on adaptivedynamicprogramming (ADP) algorithm is proposed to solve the linear quadratic regulation (LQR) problems for unknown discrete-time systems in an onl...
详细信息
ISBN:
(纸本)9781479945528
In this paper, an optimal tracking control approach based on adaptivedynamicprogramming (ADP) algorithm is proposed to solve the linear quadratic regulation (LQR) problems for unknown discrete-time systems in an online fashion. First, we convert the optimal tracking problem into designing infinite-horizon optimal regulator for the tracking error dynamics based on the system transformation. Then we expand the error state equation by the history data of control and state. The iterative ADP algorithm of policy iteration (PI) and value iteration (VI) are introduced to solve the value function of the controlled system. It is shown that the proposed ADP algorithm solves the LQR without requiring any knowledge of the system dynamics. The simulation results show the convergence and effectiveness of the proposed control scheme.
This paper is concerned with a novel integrated multi-step heuristic dynamicprogramming(MsHDP)algorithm for solving optimal control *** is shown that,initialized by the zero cost function,MsHDP can converge to the op...
详细信息
This paper is concerned with a novel integrated multi-step heuristic dynamicprogramming(MsHDP)algorithm for solving optimal control *** is shown that,initialized by the zero cost function,MsHDP can converge to the optimal solution of the Hamilton-Jacobi-Bellman(HJB)***,the stability of the system is analyzed using control policies generated by ***,a general stability criterion is designed to determine the admissibility of the current control *** is,the criterion is applicable not only to traditional value iteration and policy iteration but also to ***,based on the convergence and the stability criterion,the integrated MsHDP algorithm using immature control policies is developed to accelerate learning efficiency ***,actor-critic is utilized to implement the integrated MsHDP scheme,where neural networks are used to evaluate and improve the iterative policy as the parameter ***,two simulation examples are given to demonstrate that the learning effectiveness of the integrated MsHDP scheme surpasses those of other fixed or integrated methods.
This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs. Reinforceme...
详细信息
This note studies the adaptive optimal output regulation problem for continuous-time linear systems, which aims to achieve asymptotic tracking and disturbance rejection by minimizing some predefined costs. reinforcementlearning and adaptivedynamicprogramming techniques are employed to compute an approximated optimal controller using input/partial-state data despite unknown system dynamics and unmeasurable disturbance. Rigorous stability analysis shows that the proposed controller exponentially stabilizes the closed-loop system and the output of the plant asymptotically tracks the given reference signal. Simulation results on a LCL coupled inverter-based distributed generation system demonstrate the effectiveness of the proposed approach.
暂无评论