Autonomous drive of wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis ...
详细信息
Autonomous drive of wheeled mobile robot (WMR) needs implementing velocity and path tracking control subject to complex dynamical constraints. Conventionally, this control design is obtained by analysis and synthesis of the WMR system. This paper presents the dual heuristic programming (DHP) adaptive critic design of the motion control system that enables WMR to achieve the control purpose simply by learning through trial. The design consists of an adaptive critic velocity neuro-control loop and a posture neuro-control loop. The neural weights in the velocity neuro-controller (VNC) are corrected with the DHP adaptive critic method. The designer simply expresses the control objective with a utility function. The VNC learns by sequential optimization to satisfy the control objective. The posture neuro-controller (PNC) approximates the inverse velocity model of WMR so as to map planned positions to desired velocities. Supervised drive of WMR in variant velocities supplies training samples for the PNC and VNC to setup the neural weights. In autonomous drive, the learning mechanism keeps improving the PNC and VNC. The design is evaluated on an experimental WMR. The excellent results make it certain that the DHP adaptive critic motion control design enables WMR to develop the control ability autonomously.
In this work, we design a policy-iteration-based Q-learning approach for on-line optimal control of ionized hypersonic flow at the inlet of a scramjet engine. Magneto-hydrodynamics (MHD) has been recently proposed as ...
详细信息
ISBN:
(纸本)1424407060
In this work, we design a policy-iteration-based Q-learning approach for on-line optimal control of ionized hypersonic flow at the inlet of a scramjet engine. Magneto-hydrodynamics (MHD) has been recently proposed as a means for flow control in various aerospace problems. This mechanism corresponds to applying external magnetic fields to ionized flows towards achieving desired flow behavior. The applications range from external flow control for producing forces and moments on the air-vehicle to internal flow control designs, which compress and extract electrical energy from the flow. The current work looks at the later problem of internal flow control. The baseline controller and Q-function parameterizations are derived from an off-line mixed predictive-control and dynamic-programming-based design. The nominal optimal neural network Q-function and controller are updated on-line to handle modeling errors in the off-line design. The on-line implementation investigates key concerns regarding the conservativeness of the update methods. Value-iteration-based update methods have been shown to converge in a probabilistic sense. However, simulations results illustrate that realistic implementations of these methods face significant training difficulties, often failing in learning the optimal controller on-line. The present approach, therefore, uses a policy-iteration-based update, which has time-based convergence guarantees. Given the special finite-horizon nature of the problem, three novel on-line update algorithms are proposed. These algorithms incorporate different mix of concepts, which include bootstrapping, and forward and backward dynamicprogramming update rules. Simulation results illustrate success of the proposed update algorithms in re-optimizing the performance of the MHD generator during system operation
The goal of the work described in this paper is to develop a particular optimal control technique based on a Cell-Mapping technique in combination with the Q-learningreinforcementlearning method to control wheeled m...
详细信息
ISBN:
(纸本)1424408296;97
The goal of the work described in this paper is to develop a particular optimal control technique based on a Cell-Mapping technique in combination with the Q-learningreinforcementlearning method to control wheeled mobile vehicles. This approach manages 4 state variables due to a dynamic model is performed instead of a kinematics model which can be done with less variables. This new solution can be applied to non-linear continuous systems where reinforcementlearning methods have multiple constraints. Emphasis is given to the new combination of techniques, which applied to optimal control problems produce satisfactory results. The proposed algorithm is very robust to any change involved in the vehicle parameters because the vehicle model is estimated in real time from received experience.
In this paper, an analytical comparison is done between dynamicprogramming and reinforcementlearning methods in dynamic two-player games. The emphasis is on the large number of states and actions available for each ...
详细信息
ISBN:
(纸本)9780780397989
In this paper, an analytical comparison is done between dynamicprogramming and reinforcementlearning methods in dynamic two-player games. The emphasis is on the large number of states and actions available for each player and different conflictive optimization objectives of these games that make them complicated in modeling and analysis. Optimization and decision making is done through quantifying a modified Q-learning algorithm. By this method, it is shown that the information processing in large scale-long stage games will take shorter times and will result in lower decision costs whereas dynamicprogramming methods cannot handle them across long time-horizons.
This paper shows an approach to integrating common approximatedynamicprogramming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important i...
详细信息
ISBN:
(纸本)9780780397989
This paper shows an approach to integrating common approximatedynamicprogramming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.
This paper shows an approach to integrating common approximatedynamicprogramming (ADP) algorithms into a theoretical framework to address both analytical characteristicsand algorithmic features. Several important in...
详细信息
The proceedings contain 94 papers. The topics discussed include: neural adaptive control of dynamic sandwich systems with hysteresis;radial basis function based iterative learning control for stochastic distribution s...
ISBN:
(纸本)0780397983
The proceedings contain 94 papers. The topics discussed include: neural adaptive control of dynamic sandwich systems with hysteresis;radial basis function based iterative learning control for stochastic distribution systems;energy-efficient approaches to coverage holes detection in wireless sensor networks;optimal sensor placement for border perambulation;finite horizon discrete-time approximatedynamicprogramming;adaptive critic designs based coupled neurocontrollers for a static compensator;stability analysis and design for switched descriptor systems;a design of a partial sliding mode controller using duality to linear functional observer;stability of digital control systems with time delays;robust stabilization of nonlinear switched systems via switched output feedback;intermittent iterative learning control;and iterative learning control of perspective dynamic systems.
In this paper, an analytical comparison is done between dynamicprogramming and reinforcementlearning methods in dynamic two-player games. The emphasis is on the large number of states and actions available for each ...
详细信息
In this paper, an analytical comparison is done between dynamicprogramming and reinforcementlearning methods in dynamic two-player games. The emphasis is on the large number of states and actions available for each player and different conflictive optimization objectives of these games that make them complicated in modeling and analysis. Optimization and decision making is done through quantifying a modified Q-learning algorithm. By this method, it is shown that the information processing in large scale-long stage games will take shorter times and will result in lower decision costs whereas dynamicprogramming methods cannot handle them across long time-horizons
This paper shows an approach to integrating common approximatedynamicprogramming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important i...
详细信息
This paper shows an approach to integrating common approximatedynamicprogramming (ADP) algorithms into a theoretical framework to address both analytical characteristics and algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown.
Effective management of anemia due to renal failure poses many challenges to physicians. Individual response to treatment varies across patient populations and, due to the prolonged character of the therapy, changes o...
详细信息
Effective management of anemia due to renal failure poses many challenges to physicians. Individual response to treatment varies across patient populations and, due to the prolonged character of the therapy, changes over time. In this work, a reinforcementlearning-based approach is proposed as an alternative method for individualization of drug administration in the treatment of renal anemia. Q-learning, an off-policy approximatedynamicprogramming method, is applied to determine the proper dosing strategy in real time. Simulations compare the proposed methodology with the currently used dosing protocol. Presented results illustrate the ability of the proposed method to achieve the therapeutic goal for individuals with different response characteristics and its potential to become an alternative to currently used techniques. (c) 2005 Elsevier Ltd. All rights reserved.
暂无评论