In this paper a discrete tracking control algorithm for a non-holonomic two wheeled mobile robot (WMR) is presented. The basis of the control algorithm is an Adaptive Critic Design (ACD) in two model-based configurati...
详细信息
ISBN:
(纸本)9783642132315
In this paper a discrete tracking control algorithm for a non-holonomic two wheeled mobile robot (WMR) is presented. The basis of the control algorithm is an Adaptive Critic Design (ACD) in two model-based configurations: Heuristic dynamicprogramming (HDP) and Dual Heuristic programming (DHP). In proposed control algorithm Actor Critic structure, composed of two neural networks (NN), is supplied by a PD controller and a supervisory term derived from the Lyapunov stability theorem. The control algorithm works on-line and does not require preliminary learning. Verification of the proposed control algorithm was realized on a WMR. Pioneer-2DX.
In this paper we propose a direct coupling of renewable generation with deferrable demand in order to mitigate the unpredictable and non-controllable fluctuation of renewable power supply. We cast our problem in the f...
详细信息
ISBN:
(纸本)9781424483570
In this paper we propose a direct coupling of renewable generation with deferrable demand in order to mitigate the unpredictable and non-controllable fluctuation of renewable power supply. We cast our problem in the form of a stochastic dynamic program and we characterize the value function of the problem in order to develop efficient solution methods. We develop and compare two algorithms for optimally supplying renewable power to time-flexible electricity loads in the presence of a spot market, backward dynamicprogramming and approximate dynamic programming. We describe how our proposition compares to price responsive demand in terms capacity gains and energy market revenues for renewable generators, and we determine the optimal capacity of deferrable demand which can be reliably coupled to renewable generation.
Abstract In this paper, an optimal control scheme for a class of nonlinear systems with time delays in both state and control variables with respect to a quadratic performance index function is proposed using a new it...
详细信息
Abstract In this paper, an optimal control scheme for a class of nonlinear systems with time delays in both state and control variables with respect to a quadratic performance index function is proposed using a new iterative adaptive dynamicprogramming (ADP) algorithm. By introducing a delay matrix function, the explicit expression of the optimal control is obtained using the dynamicprogramming theory and the optimal control can iteratively be obtained using the adaptive critic technique. Convergence analysis is presented to prove that the performance index function can reach the optimum by the proposed method. Neural networks are used to approximate the performance index function, compute the optimal control policy, solve delay matrix function, and model the nonlinear system, respectively, for facilitating the implementation of the iterative ADP algorithm. Two examples are given to demonstrate the validity of the proposed optimal control scheme.
We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input-output data. In the first strategy, which we term 'J-Iearning,' one builds an empirical nonl...
详细信息
We propose two approximate dynamic programming (ADP)-based strategies for control of nonlinear processes using input-output data. In the first strategy, which we term 'J-Iearning,' one builds an empirical nonlinear model using closed-loop test data and performs dynamicprogramming with it to derive an improved control policy. In the second strategy, called 'Q-learning,' one tries to learn an improved control policy in a model-less manner. Compared to the conventional model predictive control approach, the new approach offers some practical advantages in using nonlinear empirical models for process control. Besides the potential reduction in the on-line computational burden, it offers a convenient way to control the degree of model extrapolation in the calculation of optimal control moves. One major difficulty associated with using an empirical model within the multi-step predictive control setting is that the model can be excessively extrapolated into regions of the state space where identification data were scarce or nonexistent, leading to performances far worse than predicted by the model. Within the proposed ADP-based strategies, this problem is handled by imposing a penalty term designed on the basis of local data distribution. A CSTR example is provided to illustrate the proposed approaches. (c) 2005 Elsevier Ltd. All rights reserved.
In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, t...
详细信息
ISBN:
(纸本)9781424477456
In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm 'synchronous' zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
Load balancing is critical for the performance of big server clusters. Although many load balancers are available for improving performance in parallel applications, the load-balancing problem is not fully solved yet....
详细信息
Load balancing is critical for the performance of big server clusters. Although many load balancers are available for improving performance in parallel applications, the load-balancing problem is not fully solved yet. Recent advances in security and architecture design advocate load balancing on a session level. However, due to the high dimensionality of session-level load balancing, little attention has been paid to this new problem. In this paper, we formulate the session-level load-balancing problem as a Markov decision problem. Then, we use approximate dynamic programming to obtain approximate load-balancing policies that are scalable with the problem instance. Extensive numerical experiments show that the policies have nearly optimal performance.
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are ...
详细信息
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function-approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of special interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients. Our results extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms. (C) 2009 Elsevier Ltd. All rights reserved.
This paper presents a convergence analysis of particle swarm optimization system by treating it as a discrete-time linear time-variant system firstly. And then, based on the results of system convergence conditions, d...
详细信息
This paper presents a convergence analysis of particle swarm optimization system by treating it as a discrete-time linear time-variant system firstly. And then, based on the results of system convergence conditions, dynamic optimal control of a deterministic PSO system for parameters optimization is studied by using dynamicprogramming;and an approximate dynamic programming algorithm - swarm-based approximate dynamic programming (swarm-ADP) is proposed in this paper. Finally, numerical simulations proved the validated of this presented dynamic optimization method.
The purpose of this paper is to survey techniques for constructing effective policies for controlling complex networks, and to extend these techniques to capture special features of wireless communication networks und...
详细信息
The purpose of this paper is to survey techniques for constructing effective policies for controlling complex networks, and to extend these techniques to capture special features of wireless communication networks under different networking scenarios. Among the key questions addressed are: The relationship between static network equilibria, and dynamic network control. The effect of coding on control and delay through rate regions. Routing, scheduling, and admission control. These approximations are the basis of a specific formulation of an h-MaxWeight policy for network routing. Simulations show a 50% improvement in average delay performance as compared to methods used in current practice.
This paper addresses the problem of finding a control policy that drives a generic discrete event stochastic system from an initial state to a set of goal states with a specified probability. The control policy is ite...
详细信息
This paper addresses the problem of finding a control policy that drives a generic discrete event stochastic system from an initial state to a set of goal states with a specified probability. The control policy is iteratively constructed via an approximate dynamic programming (ADP) technique over a small subset of the state space that is evolved via Monte Carlo simulations. The effect of certain user-chosen parameters on the performance of the algorithm is investigated The method is evaluated on several stochastic shortest path (SSP) examples and on a manufacturing job shop problem. We solve SSP problems that contain up to one million states to illustrate the scaling of computational and memory benefits with respect to the problem size. In the case of the manufacturing job shop example. the proposed ADP approach outperforms a traditional rolling horizon math programming approach. (C) 2009 Elsevier Ltd. All rights reserved.
暂无评论