This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highli...
详细信息
This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highlight is the adaptive auxiliary signal of the actuator fault, which is designed to offset the effect of the fault. The considered systems are in strict-feedback forms and involve unknown nonlinear functions, which will result in the causal problem. To solve this problem, the original nonlinear systems are transformed into a novel system by employing the diffeomorphism theory. Besides, the action neural networks (ANNs) are utilized to approximate a predefined unknown function in the backstepping design procedure. Combined the strategic utility function and the ACD technique, a reinforcementlearning algorithm is proposed to set up an optimal FTC, in which the critic neural networks (CNNs) provide an approximate structure of the cost function. In this case, it not only guarantees the stability of the systems, but also achieves the optimal control performance as well. In the end, two simulation examples are used to show the effectiveness of the proposed optimal FTC strategy.
This paper examines a reinforcementlearning strategy for controlling a two degree-of-freedom (2-DOF) helicopter. The pitch and yaw angles are regulated to their corresponding reference angles by applying appropriate ...
详细信息
An online adaptivelearning approach based on costate function approximation is developed to solve an optimal control problem in real time. The proposed approach tackles the main concerns associated with the classical...
详细信息
ISBN:
(纸本)9781538646182
An online adaptivelearning approach based on costate function approximation is developed to solve an optimal control problem in real time. The proposed approach tackles the main concerns associated with the classical Dual Heuristic dynamicprogramming techniques in uncertain dynamical environments. It employs a policy iteration paradigm along with adaptive critics to implement the adaptivelearning solution. The resultant framework does not need or require prior knowledge of the system dynamics, which makes it suitable for systems with high modeling uncertainties. As a proof of concept, the suggested structure is applied for the auto-pilot control of a flexible wing aircraft with unknown dynamics which are continuously varying at each trim speed condition. Numerical simulations showed that the adaptive control technique was able to learn the system's dynamics and regulate its states as desired in a relatively short time.
In the first issue of Nature 2015, Google DeepMind published a paper “Human-level control through deep reinforcementlearning.” Furthermore, in the first issue of Nature 2016, it published a cover paper “Master...
详细信息
In the first issue of Nature 2015, Google DeepMind published a paper “Human-level control through deep reinforcementlearning.” Furthermore, in the first issue of Nature 2016, it published a cover paper “Mastering the game of Go with deep neural networks and tree search” and proposed the computer Go program, AlphaGo. In March 2016, AlphaGo beat the world’s top Go player Lee Sedol by 4:1. This becomes a new milestone in artificial intelligence history, the core of which is the algorithm of deep reinforcementlearning (RL).
Deep reinforcementlearning is a focus research area in artificial intelligence. The principle of optimality in dynamicprogramming is a key to the success of reinforcementlearning methods. The principle of adaptive ...
详细信息
This paper sums up four typical schemes of adaptivedynamicprogramming (ADP). The diagrams are provided and the algorithms of various schemes are described, which is convenient for comparison. Some schemes in this pa...
详细信息
ISBN:
(纸本)9789881563958
This paper sums up four typical schemes of adaptivedynamicprogramming (ADP). The diagrams are provided and the algorithms of various schemes are described, which is convenient for comparison. Some schemes in this paper belong to the group of action-dependent (AD) adaptive critic designs, which features without a model network in the design. For simplicity of notation, we do not use the prefix AD. The learning process of ADP is accomplished by updating the weights of the networks. The weight updating processes of some networks in GDHP scheme are introduced.
In this paper, the decentralized control problem is solved based on a policy iteration algorithm for large-scale nonlinear systems with unknown mismatched interconnections. The unknown interconnection is approximated ...
详细信息
In this paper, the decentralized control problem is solved based on a policy iteration algorithm for large-scale nonlinear systems with unknown mismatched interconnections. The unknown interconnection is approximated by a neural network with local states of isolated subsystem and substituted reference states of coupled subsystems. Then, an adaptive estimation term is utilized to construct the improved local performance index function that reflects the substitution error. Hereafter, the closed-loop large-scale nonlinear system is guaranteed to be ultimately uniformly bounded by the implementation of a set of developed decentralized optimal control policies. Two simulation examples are given to verify the effectiveness of the presented scheme. The significant contribution of this scheme lies in that it removes the common assumptions on satisfying matching condition and upper boundedness of interconnections, when designing the decentralized optimal control for large-scale nonlinear systems.
In this paper we proposed an approach of approximating optimal tracking via expectation-maximization (EM) evaluation. From the discussion of applying reinforcementlearning (RL) for a system with unknown internal dyna...
详细信息
ISBN:
(纸本)9781538636640
In this paper we proposed an approach of approximating optimal tracking via expectation-maximization (EM) evaluation. From the discussion of applying reinforcementlearning (RL) for a system with unknown internal dynamics, we present the challenge of using a classical frame of Q-learning on a tracking task. Further we explained the idea of redefining the cost function (i.e. criterion) of Q-learning to satisfy the requirement for the system dynamic knowledge for the tracking task. We explained the advantages of dividing the original trajectory tracking task into two machine learning subtasks (i.e. learning the quadratic regulator and learning the baseline command generator) on-line. Details are given on the integration of the Q-learning frame and EM algorithm as well as the convergence to the optimum control via iterative estimation of an optimal regulator and a baseline generator. Initial simulation results of this approach using a second order system showed the ability of the Q-learning frame integrated with the EM algorithm approximates to the optimal tracking task.
The majority of current studies on autonomous vehicle control via deep reinforcementlearning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration c...
详细信息
The majority of current studies on autonomous vehicle control via deep reinforcementlearning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration command dynamics. The acceleration delay, which results from sensing and actuation delays, results in delayed execution of the control inputs. The acceleration command dynamics dictates that the actual vehicle acceleration does not rise up to the desired command acceleration instantaneously due to dynamics. In this work, we investigate the feasibility of applying DRL controllers trained using vehicle kinematic models to more realistic driving control with vehicle dynamics. We consider a particular longitudinal car-following control, i.e., adaptive Cruise Control (ACC), problem solved via DRL using a point-mass kinematic model. When such a controller is applied to car following with vehicle dynamics, we observe significantly degraded car-following performance. Therefore, we redesign the DRL framework to accommodate the acceleration delay and acceleration command dynamics by adding the delayed control inputs and the actual vehicle acceleration to the reinforcementlearning environment state, respectively. The training results show that the redesigned DRL controller results in near-optimal control performance of car following with vehicle dynamics considered when compared with dynamicprogramming solutions.
In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controller...
In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controllers require a patient-specific manual tuning of the parameters defining the underlying force-field, which typically results in a tedious and time-consuming process. In this paper, we propose a reinforcement-learning-based impedance controller that actively reshapes the stiffness of the force-field to the subject's performance, while providing assistance only when needed. This adaptability is made possible by correlating the subject's most recent performance to the ultimate control objective in real-time. In addition, the proposed controller is built upon action dependent heuristic dynamicprogramming using the actor-critic structure, and therefore does not require prior knowledge of the system model. The controller is experimentally validated with healthy subjects through a simulated ankle mobilization training session using a powered ankle-foot orthosis.
暂无评论