Model-based dual heuristic dynamicprogramming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting...
详细信息
Model-based dual heuristic dynamicprogramming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) design based on finite-difference technique. In particular, we adopt multilayer perceptron with one hidden layer for both the action and the critic networks design, and use delayed objective functions to train both the action and the critic networks online over time. We test both the MF-DHP and MB-DHP approaches with a discrete time example and a continuous time example under the same parameter settings. Our simulation results demonstrate that the MF-DHP approach can obtain a control performance competitive with that of the traditional MB-DHP approach while requiring less computational resources.
This paper presents an adaptive and intelligent power control approach for microgrid systems in the gridconnected operation mode. The proposed critic-based adaptive control system contains a neuro-fuzzy controller and...
详细信息
This paper presents an adaptive and intelligent power control approach for microgrid systems in the gridconnected operation mode. The proposed critic-based adaptive control system contains a neuro-fuzzy controller and a fuzzy critic agent. The fuzzy critic agent employs a reinforcementlearning algorithm based on neuro-dynamicprogramming. The system feedback is made available to the critic agent's input as the controller's action in the previous state. The evaluation or reinforcement signal produced by the critic agent together with the back-propagation of error is then used for online tuning of the output layer weights of the neuro-fuzzy controller. The proposed controller shows superior results compared with the traditional PI control. The transient response time is significantly reduced, power oscillations are eliminated, and fast convergence is achieved. The simple design and improved dynamic behavior of the proposed controller make it a promising nominee for power control of microgrid systems.
In this paper,a novel partially model-free adaptivedynamicprogramming(ADP) algorithm is presented to solve online the nonzero-sum differential games of continuous-time linear systems with unknown drift ***,by using ...
详细信息
ISBN:
(纸本)9781467397155
In this paper,a novel partially model-free adaptivedynamicprogramming(ADP) algorithm is presented to solve online the nonzero-sum differential games of continuous-time linear systems with unknown drift ***,by using the integral reinforcementlearning technique,the partially model-free ADP algorithm is developed to solve online the set of coupled algebraic Riccati equation(ARE) underlying the game problem without the requirement of the complete knowledge of the system *** then,the convergence of the partially model-free ADP algorithm is proved by demonstrating that it is mathematically equivalent to the extended Kleiman's algorithm,previously proposed in the literature,that solves in an offline sense the set of coupled algebraic Riccati equation using the complete knowledge of the system ***,one example is given to demonstrate the efficiency of the proposed algorithm.
A general utility function representation is proposed to provide the required derivable and adjustable utility function for the dual heuristic dynamicprogramming (DHP) design. Goal representation DHP (GrDHP) is prese...
详细信息
A general utility function representation is proposed to provide the required derivable and adjustable utility function for the dual heuristic dynamicprogramming (DHP) design. Goal representation DHP (GrDHP) is presented with a goal network being on top of the traditional DHP design. This goal network provides a general mapping between the system states and the derivatives of the utility function. With this proposed architecture, we can obtain the required derivatives of the utility function directly from the goal network. In addition, instead of a fixed predefined utility function in literature, we conduct an online learning process for the goal network so that the derivatives of the utility function can be adaptively tuned over time. We provide the control performance of both the proposed GrDHP and the traditional DHP approaches under the same environment and parameter settings. The statistical simulation results and the snapshot of the system variables are presented to demonstrate the improved learning and controlling performance. We also apply both approaches to a power system example to further demonstrate the control capabilities of the GrDHP approach.
The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivat...
详细信息
The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.
Inspired by the core idea of AlphaGo, we combine a neural network, which is trained by adaptivedynamicprogramming (ADP), with Monte Carlo Tree Search (MCTS) algorithm for Gomoku. MCTS algorithm is based on Monte Car...
详细信息
ISBN:
(纸本)9781509042418
Inspired by the core idea of AlphaGo, we combine a neural network, which is trained by adaptivedynamicprogramming (ADP), with Monte Carlo Tree Search (MCTS) algorithm for Gomoku. MCTS algorithm is based on Monte Carlo simulation method, which goes through lots of simulations and generates a game search tree. We rollout it and search the outcomes of the leaf nodes in the tree. As a result, we obtain the MCTS winning rate. The ADP and MCTS methods are used to estimate the winning rates respectively. We weight the two winning rates to select the action position with the maximum one. Experiment result shows that this method can effectively eliminate the neural network evaluation function's “short-sighted” defect. With our proposed method, the game's final prediction result is more accurate, and it outperforms the Gomoku with ADP algorithm.
A new theoretical analysis towards the goal representation adaptivedynamicprogramming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptivedynamic programm...
详细信息
ISBN:
(纸本)9781479919598
A new theoretical analysis towards the goal representation adaptivedynamicprogramming (GrADP) design proposed in [1], [2] is investigated in this paper. Unlike the proofs of convergence for adaptivedynamicprogramming (ADP) in literature, here we provide a new insight for the error bound between the estimated value function and the expected value function. Then we employ the critic network in GrADP approach to approximate the Q value function, and use the action network to provide the control policy. The goal network is adopted to provide the internal reinforcement signal for the critic network over time. Finally, we illustrate that the estimated Q value function is close to the expected value function in an arbitrary small bound on the maze navigation example.
In this paper, a novel Q-learning based policy iteration adaptivedynamicprogramming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy...
详细信息
ISBN:
(纸本)9783319253930;9783319253923
In this paper, a novel Q-learning based policy iteration adaptivedynamicprogramming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.
With rapid increases in demand for mobile data, mobile network operators are trying to expand wireless network capacity by deploying WiFi hotspots to offload their mobile traffic. However, these network-centric method...
详细信息
ISBN:
(纸本)9781467390576
With rapid increases in demand for mobile data, mobile network operators are trying to expand wireless network capacity by deploying WiFi hotspots to offload their mobile traffic. However, these network-centric methods usually do not fulfill interests of mobile users (MUs). MUs consider many problems to decide whether to offload their traffic to a complementary WiFi network. In this paper, we study the WiFi offloading problem from MU's perspective by considering delay-tolerance of traffic, monetary cost, energy consumption as well as the availability of MU's mobility pattern. We first formulate the WiFi offloading problem as a finite-horizon discrete-time Markov decision process (FDTMDP) with known MU's mobility pattern and propose a dynamicprogramming based offloading algorithm. Since MU's mobility pattern may not be known in advance, we then propose a reinforcementlearning based offloading algorithm, which can work well with unknown MU's mobility pattern. Extensive simulations are conducted to validate our proposed offloading algorithms.
This paper develops an adaptivedynamicprogramming (ADP) based near optimal boundary control of distributed parameter systems (DPS) governed by uncertain coupled semi-linear parabolic partial differential equations (...
详细信息
ISBN:
(纸本)9781479977888
This paper develops an adaptivedynamicprogramming (ADP) based near optimal boundary control of distributed parameter systems (DPS) governed by uncertain coupled semi-linear parabolic partial differential equations (PDE) under Neumann boundary control condition. First, Hamilton-Jacobi-Bellman (HJB) equation is formulated without any model reduction and the optimal control policy is derived. Subsequently, a novel identifier is developed to estimate the unknown nonlinearity in PDE dynamics. Accordingly, the sub-optimal control policy is obtained by forward-in-time estimation of the value functional using a neural network (NN) online approximator and the identifier. adaptive tuning laws are proposed for learning the value functional online. Local ultimate boundedness (UB) of the closed-loop system is verified by using Lyapunov theory. The performance of proposed controller is verified via simulation on an unstable coupled diffusion reaction process.
暂无评论