The two-player zero-sum (ZS) game problem provides the solution to the bounded L2-gain problem and so is important for robust control. However, its solution depends on solving a design HamiltonJacobiIsaacs (HJI) equat...
详细信息
The two-player zero-sum (ZS) game problem provides the solution to the bounded L2-gain problem and so is important for robust control. However, its solution depends on solving a design HamiltonJacobiIsaacs (HJI) equation, which is generally intractable for nonlinear systems. In this paper, we present an online adaptive learning algorithm based on policy iteration to solve the continuous-time two-player ZS game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real time an approximate local solution to the game HJI equation. This method finds, in real time, suitable approximations of the optimal value and the saddle point feedback control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic/disturbance structure that involves simultaneous continuous-time adaptation of critic, actor, and disturbance neural networks. We call this online gaming algorithm synchronous ZS game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor, and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm in solving the HJI equation online for a linear system and a complex nonlinear system. Copyright (c) 2011 John Wiley & Sons, Ltd.
In this paper a rate control scheme for downlink packet transmission in CDMA networks is proposed based on both the queue lengths and the channel states of mobile users. We are interested in optimal rate allocation po...
详细信息
In this paper a rate control scheme for downlink packet transmission in CDMA networks is proposed based on both the queue lengths and the channel states of mobile users. We are interested in optimal rate allocation policies for throughput maximisation over time and thus we formulate the problem as a discrete stochastic dynamic program. This dynamic program is exponentially complex in the number of users, which renders it impractical and therefore we use an approximate dynamic programming (DP) algorithm to obtain suboptimal rate allocation policies in real time. The numerical results reveal that the proposed algorithm significantly outperforms a number of different baseline greedy heuristics. (c) 2012 Elsevier B.V. All rights reserved.
An intelligent-optimal control scheme for unknown nonaffine nonlinear discrete-time systems with discount factor in the cost function is developed in this paper. The iterative adaptive dynamicprogramming algorithm is...
详细信息
An intelligent-optimal control scheme for unknown nonaffine nonlinear discrete-time systems with discount factor in the cost function is developed in this paper. The iterative adaptive dynamicprogramming algorithm is introduced to solve the optimal control problem with convergence analysis. Then, the implementation of the iterative algorithm via globalized dual heuristic programming technique is presented by using three neural networks, which will approximate at each iteration the cost function, the control law, and the unknown nonlinear system, respectively. In addition, two simulation examples are provided to verify the effectiveness of the developed optimal control approach. (C) 2012 Elsevier Ltd. All rights reserved.
In this paper, a finite horizon iterative adaptive dynamicprogramming (ADP) algorithm is proposed to solve the optimal control problem for a class of discrete-time nonlinear systems with unfixed initial state. A new ...
详细信息
In this paper, a finite horizon iterative adaptive dynamicprogramming (ADP) algorithm is proposed to solve the optimal control problem for a class of discrete-time nonlinear systems with unfixed initial state. A new is an element of-optimal control algorithm based on the iterative ADP approach is proposed that makes the performance index function iteratively converge to the greatest lower bound of all performance indices within an error is an element of in finite time. The convergence analysis of the proposed ADP algorithm in terms of performance index function and control policy is conducted. The optimal number of control steps can also be obtained by the proposed is an element of-optimal control algorithm for the unfixed initial state. Neural networks are used to approximate the performance index function, and compute the optimal control policy, respectively, for facilitating the implementation of the is an element of-optimal control algorithm. Finally, a simulation example is given to show the effectiveness of the proposed method. (C) 2012 Elsevier Ltd. All rights reserved.
In this paper, a neuro-optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is developed. The iterative adaptive dynamicprogramming algorithm using g...
详细信息
In this paper, a neuro-optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is developed. The iterative adaptive dynamicprogramming algorithm using globalized dual heuristic programming technique is introduced to obtain the optimal controller with convergence analysis in terms of cost function and control law. In order to carry out the iterative algorithm, a neural network is constructed first to identify the unknown controlled system. Then, based on the learned system model, two other neural networks are employed as parametric structures to facilitate the implementation of the iterative algorithm, which aims at approximating at each iteration the cost function and its derivatives and the control law, respectively. Finally, a simulation example is provided to verify the effectiveness of the proposed optimal control approach.
The considerable cost of maintaining large fleets has generated interest in cost minimization strategies. With many related decisions, numerous constraints, and significant sources of uncertainty (e.g. vehicle breakdo...
详细信息
The considerable cost of maintaining large fleets has generated interest in cost minimization strategies. With many related decisions, numerous constraints, and significant sources of uncertainty (e.g. vehicle breakdowns), fleet managers face complex dynamic optimization problems. Existing methodologies frequently make simplifying assumptions or fail to converge quickly for large problems. This paper presents an approximate dynamic programming approach for making vehicle purchase, resale, and retrofit decisions in a fleet setting with stochastic vehicle breakdowns. Value iteration is informed by dual variables from linear programs, as well as other bounds on vehicle shadow prices. Sample problems are based on a government fleet seeking to comply with emissions regulation. The model predicts the expected cost of compliance, the rules the fleet manager will use in deciding how to comply, and the regulation's impact on the value of vehicles in the fleet. Stricter regulation lowers the value of some vehicle categories while raising the value of others. Such insights can help guide regulators, as well as the fleet managers they oversee. The methodologies developed could be applied more broadly to general multi-asset replacement problems, many of which have similar structures. (C) 2012 Elsevier Ltd. All rights reserved.
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD(lambda), in an exploration-enhanced learning cont...
详细信息
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD(lambda), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain different from the one corresponding to the policy under evaluation. We establish for the discounted cost criterion that LSTD(lambda) converges almost surely under mild, minimal conditions. We also analyze other properties of the iterates involved in the algorithm, including convergence in mean and boundedness. Our analysis draws on theories of both finite space Markov chains and weak Feller Markov chains on a topological space. Our results can be applied to other temporal difference algorithms and MDP models. As examples, we give a convergence analysis of a TD(lambda) algorithm and extensions to MDP with compact state and action spaces, as well as a convergence proof of a new LSTD algorithm with state-dependent lambda-parameters.
Seeking to reduce the potential impact of delays on radiation therapy cancer patients such as psychological distress, deterioration in quality of life and decreased cancer control and survival, and motivated by ineffi...
详细信息
Seeking to reduce the potential impact of delays on radiation therapy cancer patients such as psychological distress, deterioration in quality of life and decreased cancer control and survival, and motivated by inefficiencies in the use of expensive resources, we undertook a study of scheduling practices at the British Columbia Cancer Agency (BCCA). As a result, we formulated and solved a discounted infinite-horizon Markov decision process for scheduling cancer treatments in radiation therapy units. The main purpose of this model is to identify good policies for allocating available treatment capacity to incoming demand, while reducing wait times in a cost-effective manner. We use an affine architecture to approximate the value function in our formulation and solve an equivalent linear programming model through column generation to obtain an approximate optimal policy for this problem. The benefits from the proposed method are evaluated by simulating its performance for a practical example based on data provided by the BCCA. (C) 2012 Elsevier B.V. All rights reserved.
We provide a practical methodology for solving the generalized joint replenishment (GJR) problem, based on a mathematical programming approach to approximate dynamic programming. We show how to automatically generate ...
详细信息
We provide a practical methodology for solving the generalized joint replenishment (GJR) problem, based on a mathematical programming approach to approximate dynamic programming. We show how to automatically generate a value function approximation basis built upon piecewise-linear ridge functions by developing and exploiting a theoretical connection with the problem of finding optimal cyclic schedules. We provide a variant of the algorithm that is effective in practice, and we exploit the special structure of the GJR problem to provide a coherent, implementable framework.
Convergence is proven of the value-iteration-based algorithm to find the optimal controller in the case of general non-affine in input nonlinear systems. That is, it is shown that algorithm converges to the optimal co...
详细信息
Convergence is proven of the value-iteration-based algorithm to find the optimal controller in the case of general non-affine in input nonlinear systems. That is, it is shown that algorithm converges to the optimal control and the optimal value function. It is assumed that at each iteration the value and action update equations can be exactly solved. Then two standard neural networks (NN) are used: a critic NN is used to approximate the value function while an action network is used to approximate the optimal control policy. (C) 2012 Jordan Journal of Mechanical and Industrial Engineering. All rights reserved
暂无评论