This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcementlearning method. It is known that the optim...
详细信息
This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcementlearning method. It is known that the optimal consensus control for multi-agent systems relies on the solution of the coupled Hamilton-Jacobi-Bellman equation, which is generally impossible to be solved analytically. Even worse, most real-world systems are too complicated to obtain accurate mathematical models. To overcome these deficiencies, a data-based adaptivedynamicprogramming method is presented using the current and past system data rather than the accurate system models also instead of the traditional identification scheme which would cause the approximation residual errors. First, we establish a discounted performance index and formulate the optimal consensus problem via Bellman optimality principle. Then, we introduce the policy iteration algorithm which motivates this paper. To implement the proposed online action-dependent heuristic dynamicprogramming method, two neural networks (NNs), 1) critic NN and 2) actor NN, are employed to approximate the iterative performance index functions and control policies, respectively, in real time. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed method.
The proceedings contain 97 papers. The special focus in this conference is on Neural Networks. The topics include: Development of a sensory-neural network for medical diagnosing;review of pseudoinverse learning algori...
ISBN:
(纸本)9783319925363
The proceedings contain 97 papers. The special focus in this conference is on Neural Networks. The topics include: Development of a sensory-neural network for medical diagnosing;review of pseudoinverse learning algorithm for multilayer neural networks and applications;identification of vessel kinetics based on neural networks via concurrent learning;method to improve the performance of restricted boltzmann machines;modeling hysteresis using non-smooth neural networks;The implementation of a pointer network model for traveling salesman problem on a xilinx PYNQ board;generalized affine scaling trajectory analysis for linearly constrained convex programming;Drift compensation for E-nose using QPSO-based domain adaptation kernel ELM;convergence analysis of self-adaptive immune particle swarm optimization algorithm;a neurodynamic approach to multiobjective linear programming;an improved artificial fish swarm algorithm to solve the cutting stock problem;a hyper heuristic algorithm for low carbon location routing problem;pulse neuron supervised learning rules for adapting the dynamics of synaptic connections;an artificial neural network for solving quadratic zero-one programming problems;A new parameter identification method for type-1 TS fuzzy neural network;performance enhancement of deep reinforcementlearning networks using feature extraction;online grnn-based ensembles for regression on evolving data streams;a broad neural network structure for class incremental learning;weibocluster: An event-oriented sina weibo dataset with estimating credit;robust neural networks learning: New approaches;neural network model of unconscious;data cleaning and classification in the presence of label noise with class-specific autoencoder;using the wide and deep flexible neural tree to forecast the exchange rate;recurrent neural network with dynamic memory.
adaptivedynamicprogramming (ADP) and reinforcementlearning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of ev...
详细信息
adaptivedynamicprogramming (ADP) and reinforcementlearning are quite relevant to each other when performing intelligent optimization. They are both regarded as promising methods involving important components of evaluation and improvement, at the background of information technology, such as artificial intelligence, big data, and deep learning. Although great progresses have been achieved and surveyed when addressing nonlinear optimal control problems, the research on robustness of ADP-based control strategies under uncertain environment has not been fully summarized. Hence, this survey reviews the recent main results of adaptive-critic-based robust control design of continuous-time nonlinear systems. The ADP-based nonlinear optimal regulation is reviewed, followed by robust stabilization of nonlinear systems with matched uncertainties, guaranteed cost control design of unmatched plants, and decentralized stabilization of interconnected systems. Additionally, further comprehensive discussions are presented, including event-based robust control design, improvement of the critic learning rule, nonlinear H-infinity control design, and several notes on future perspectives. By applying the ADP-based optimal and robust control methods to a practical power system and an overhead crane plant, two typical examples are provided to verify the effectiveness of theoretical results. Overall, this survey is beneficial to promote the development of adaptive critic control methods with robustness guarantee and the construction of higher level intelligent systems.
This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm ...
详细信息
ISBN:
(纸本)9781538627266
This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.
In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptivedynamicprogramming (ADP) method. The main idea of our proposed P...
详细信息
In this paper, we investigate the nonzero-sum games for a class of discrete-time (DT) nonlinear systems by using a novel policy iteration (PI) adaptivedynamicprogramming (ADP) method. The main idea of our proposed PI scheme is to utilize the iterative ADP algorithm to obtain the iterative control policies, which not only ensure the system to achieve stability but also minimize the performance index function for each player. This paper integrates game theory, optimal control theory, and reinforcementlearning technique to formulate and handle the DT nonzero-sum games for multiplayer. First, we design three actor-critic algorithms, an offline one and two online ones, for the PI scheme. Subsequently, neural networks are employed to implement these algorithms and the corresponding stability analysis is also provided via the Lyapunov theory. Finally, a numerical simulation example is presented to demonstrate the effectiveness of our proposed approach.
This paper establishes an off-policy integral reinforcementlearning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain ...
详细信息
This paper establishes an off-policy integral reinforcementlearning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.
Human-level control through deep learning and deep reinforcementlearning have revealed the unique and powerful potentials through a very complex Go game. The AlphaGo, developed by Google DeepMind, has beat the top Go...
详细信息
ISBN:
(纸本)9781509061822
Human-level control through deep learning and deep reinforcementlearning have revealed the unique and powerful potentials through a very complex Go game. The AlphaGo, developed by Google DeepMind, has beat the top Go game player early this year. The scientific and technological advancement behind the success of AlphaGo attracted researchers from multiple areas, including machine learning, artificial intelligence, computational intelligence and so on. adaptivedynamicprogramming (ADP) methods have the similar fundamental principle with reinforcementlearning, and show strong performance for continuous time and continuous state systems. Deep learning techniques are also possible to be integrated for ADP designs. In this paper, we discuss the key techniques and components in deep reinforcementlearning and then present the successful applications for computer games and maze navigation. Future opportunities for deep learning enabled ADP will be discussed at the end.
This paper examines a reinforcementlearning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate ...
详细信息
This paper examines a reinforcementlearning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate actuator commands (input voltages) to the main and tail rotors of a 2-DOF helicopter using the proposed reinforcementlearning [herein called the approximate dynamicprogramming (ADP)] strategy. Furthermore, the proposed strategy has the ability to configure the 2-DOF helicopter to track time-varying reference angles. The proposed ADP technique is capable of dealing with coupling effects between the rigid body structure and propeller dynamics associated with the 2-DOF helicopter model considered in this work. A set of computer simulations is conducted to evaluate the performance of the proposed algorithm. The performance of the proposed algorithm is also compared to that of a conventional linear-quadratic regulator (LQR).
Feature representation is critical not only for pattern recognition tasks but also for reinforcementlearning (RL) methods to solve learning control problems under uncertainties. In this paper, a manifold-based RL app...
详细信息
Feature representation is critical not only for pattern recognition tasks but also for reinforcementlearning (RL) methods to solve learning control problems under uncertainties. In this paper, a manifold-based RL approach using the principle of locally linear reconstruction (LLR) is proposed for Markov decision processes with large or continuous state spaces. In the proposed approach, an LLR-based feature learning scheme is developed for value function approximation in RL, where a set of smooth feature vectors is generated by preserving the local approximation properties of neighboring points in the original state space. By using the proposed feature learning scheme, an LLR-based approximate policy iteration (API) algorithm is designed for learning control problems with large or continuous state spaces. The relationship between the value approximation error of a new data point and the estimated values of its nearest neighbors is analyzed. In order to compare different feature representation and learning approaches for RL, a comprehensive simulation and experimental study was conducted on three benchmark learning control problems. It is illustrated that under a wide range of parameter settings, the LLR-based API algorithm can obtain better learning control performance than the previous API methods with different feature representation schemes.
In this paper, a novel frame work of reinforcementlearning for continuous time dynamical system is presented based on the Hamiltonian functional and extreme learning machine. The idea of solution search in the optimi...
详细信息
ISBN:
(纸本)9783319590721;9783319590714
In this paper, a novel frame work of reinforcementlearning for continuous time dynamical system is presented based on the Hamiltonian functional and extreme learning machine. The idea of solution search in the optimization is introduced to find the optimal control policy in the optimal control problem. The optimal control search consists of three steps: evaluation, comparison and improvement of arbitrary admissible policy. The Hamiltonian functional plays an important role in the above framework, under which only one critic is required in the adaptive critic structure. The critic network is implemented by the extreme learning machine. Finally, simulation study is conducted to verify the effectiveness of the presented algorithm.
暂无评论