In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear...
详细信息
In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system. Note to Practitioners-Noncooperative zero-sum differential game provides an ideal tool to study multiplayer optimal decision and control problems. Existing approaches usually solve the Nash equilibrium solution by means of offline iterative computation, and require the exact knowledge of the system dynamics. However, it is difficult to obtain the exact knowledge of the system dynamics for many real-world industrial systems. The algorithm developed in this paper is a fully model-free method which solves the zero-sum differential game problem forward in time by making use of online measured data. This method is not affected by errors between an identification model and a real system, and responds fast to changes of the system dynamics. Exploration signals are required to satisfy the persistence of excitation condition to update the value function and the policies, and these signals do not affect the convergence of the learning process. The least squares method is used to obtain the approximate solution for the zero-sum games with unknown dynamics. The developed a
In this study, a neural-network-based online learning algorithm is established to solve the finite horizon linear quadratic tracking (FHLQT) problem for partially unknown continuous-time systems. An augmented problem ...
详细信息
In this study, a neural-network-based online learning algorithm is established to solve the finite horizon linear quadratic tracking (FHLQT) problem for partially unknown continuous-time systems. An augmented problem is constructed with an augmented state which consists of the system state and the reference trajectory. The authors obtain a solution for the augmented problem which is equivalent to the standard solution of the FHLQT problem. To solve the augmented problem with partially unknown system dynamics, they develop a time-varying Riccati equation. A critic neural network is used to approximate the value function and an online learning algorithm is established using the policy iteration technique to solve the time-varying Riccati equation. An integral policy iteration method and an online tuning law are used when the algorithm is implemented without the knowledge of the system drift dynamics and the command generator dynamics. A simulation example is given to show the effectiveness of the established algorithm.
Field data are important for convenient daily travel of urban residents, reducing traffic congestion and accidents, pursuing a low-carbon environment-friendly sustainable development strategy, and meeting the extra pe...
详细信息
Field data are important for convenient daily travel of urban residents, reducing traffic congestion and accidents, pursuing a low-carbon environment-friendly sustainable development strategy, and meeting the extra peak traffic demand of large sporting events or large business activities, etc. To meet the field data demand during the 2010 Asian (Para) Games held in Guangzhou, China, based on the novel Artificial systems, Computational experiments, and Parallel execution (ACP) approach, the Parallel Traffic management System (PtMS) was developed. It successfully helps to achieve smoothness, safety, efficiency, and reliability of public transport management during the two games, supports public traffic management and decision making, and helps enhance the public traffic management level from experience-based policy formulation and manual implementation to scientific computing-based policy formulation and implementation. The PtMS represents another new milestone in solving the management difficulty of real-world complexsystems.
Visual classification has long been a major challenge for computer vision. In recent years, biologically inspired visual models have raised great interests. However, most of the related studies mainly focus on learnin...
详细信息
Visual classification has long been a major challenge for computer vision. In recent years, biologically inspired visual models have raised great interests. However, most of the related studies mainly focus on learning features and representations from very large scale dataset relying on deep network architecture, which is doomed to fail with limited training samples due to its high complexity. In this paper, it is found that aside from the deep architecture, two other biologically inspired mechanisms, the pooling and nonlinear operations, also contribute to the improvement of classification performance. Based on this perspective, a new classifier of shallow architecture is proposed, in which the both mechanisms are implemented with max operation. Moreover, the architecture is derived in a probabilistic perspective to further explain the underlying rationale thereof. To train the classifier, a supervised learning algorithm is devised to minimize the hinge loss function under the new architecture. Based on the manifold assumption of continuously transforming features, an unsupervised learning algorithm is also presented to learn the features used by the classifier. Finally, the method is compared against other classifiers on several image classification benchmarks. The results demonstrate the strength of the proposed method when the training data source is limited. (c) 2014 Elsevier B.V. All rights reserved.
Parallel-jaw gripper finds wide applications in various industrial sectors. In this paper, we mainly focus on the problem of form closure caging grasps of polygons with a parallel-jaw gripper equipped with four finger...
详细信息
Parallel-jaw gripper finds wide applications in various industrial sectors. In this paper, we mainly focus on the problem of form closure caging grasps of polygons with a parallel-jaw gripper equipped with four fingers. The form closure caging grasp is helpful for the fingers placements and contact region selections of a pneumatic gripper, as it is less sensitive to fingers misplacements. We firstly prove that there is always a path from a cage to a form closure grasp of the object that never breaks the cage, as long as the attractive region constructed in the configuration space has a local minimum. If such a minimum cannot be found, we further adjust the fingers arrangements to produce the form closure grasp. Meanwhile, we also develop an algorithm to compute the initial cage of the form closure grasp. Simulations of the grasping process witness the effectiveness of the above analysis results.
In this paper, a type of fuzzy system structure is applied to heuristic dynamic programming (HDP) algorithm to solve nonlinear discrete-time Hamilton-Jacobi-Bellman (DT-HJB) problems. The fuzzy system here is adopted ...
详细信息
In this paper, a type of fuzzy system structure is applied to heuristic dynamic programming (HDP) algorithm to solve nonlinear discrete-time Hamilton-Jacobi-Bellman (DT-HJB) problems. The fuzzy system here is adopted as a 0-order T-S fuzzy system using triangle membership functions (MFs). The convergence of HDP and approximability of the multivariate 0-order T-S fuzzy system is analyzed in this paper. It is derived that the cost function and control policy of HDP can be iterated to the DT-1-1113 solution and optimal policy. The multivariate 0-order T-S (Tanaka-Sugeno) fuzzy system using triangle MFs is proven as a universal approximator, to guarantee the convergence of the Fuzzy-HDP mechanism. Some simulations are implemented to observe the performance of the proposed method both in mathematical solution and practical issue. It is concluded that Fuzzy-HDP outperforms traditional optimal control in more complexsystems. (C) 2014 Elsevier B.V. All rights reserved.
In this paper, the neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming approach is investigated. First, the robust controller of the original ...
详细信息
In this paper, the neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming approach is investigated. First, the robust controller of the original uncertain system is derived by adding a feedback gain to the optimal controller of the nominal system. It is also shown that this robust controller can achieve optimality under a specified cost function, which serves as the basic idea of the robust optimal control design. Then, a critic network is constructed to solve the Hamilton-Jacobi-Bellman equation corresponding to the nominal system, where an additional stabilizing term is introduced to verify the stability. The uniform ultimate boundedness of the closed-loop system is also proved by using the Lyapunov approach. Moreover, the obtained results are extended to solve decentralized optimal control problem of continuous-time nonlinear interconnected large-scale systems. Finally, two simulation examples are presented to illustrate the effectiveness of the established control scheme. (C) 2014 Elsevier Inc. All rights reserved.
In this paper, an optimal tracking control scheme is proposed for a class of unknown discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) algorithm. First, in order to obtain the dynamics...
详细信息
In this paper, an optimal tracking control scheme is proposed for a class of unknown discrete-time nonlinear systems using iterative adaptive dynamic programming (ADP) algorithm. First, in order to obtain the dynamics of the system, an identifier is constructed by a three-layer feedforward neural network (NN). Second, a feedforward neuro-controller is designed to get the desired control input of the system. Third, via system transformation, the original tracking problem is transformed into a regulation problem with respect to the state tracking error. Then, the iterative ADP algorithm based on heuristic dynamic programming is introduced to deal with the regulation problem with convergence analysis. In this scheme, feedforward NNs are used as parametric structures for facilitating the implementation of the iterative algorithm. Finally, simulation results are also presented to demonstrate the effectiveness of the proposed scheme. (C) 2013 Elsevier B.V. All rights reserved.
In this paper, a new infinite horizon neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems is developed. The idea is to use iterative adaptive dynamic programming (ADP) alg...
详细信息
In this paper, a new infinite horizon neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems is developed. The idea is to use iterative adaptive dynamic programming (ADP) algorithm to obtain the iterative tracking control law which makes the iterative performance index function reach the optimum. When the iterative tracking control law and iterative performance index function in each iteration cannot be accurately obtained, the convergence criteria of the iterative ADP algorithm are established according to the properties with finite approximation errors. If the convergence conditions are satisfied, it shows that the iterative performance index functions can converge to a finite neighborhood of the lowest bound of all performance index functions. Properties of the finite approximation errors for the iterative ADP algorithm are also analyzed. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Convergence properties of the neural network weights are proven. Finally, simulation results are given to illustrate the performance of the developed method. (C) 2014 Elsevier B.V. All rights reserved.
In this paper, a novel data-driven stable iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal temperature control problems for water-gas shift (WGS) reaction systems. According to the ...
详细信息
In this paper, a novel data-driven stable iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal temperature control problems for water-gas shift (WGS) reaction systems. According to the system data, neural networks (NNs) are used to construct the dynamics of the WGS system and solve the reference control, respectively, where the mathematical model of the WGS system is unnecessary. Considering the reconstruction errors of NNs and the disturbances of the system and control input, a new stable iterative ADP algorithm is developed to obtain the optimal control law. The convergence property is developed to guarantee that the iterative performance index function converges to a finite neighborhood of the optimal performance index function. The stability property is developed to guarantee that each of the iterative control laws can make the tracking error uniformly ultimately bounded (UUB). NNs are developed to implement the stable iterative ADP algorithm. Finally, numerical results are given to illustrate the effectiveness of the developed method.
暂无评论