Using the neural-network-based iterative adaptive dynamicprogramming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is prop...
详细信息
ISBN:
(纸本)9783642210891
Using the neural-network-based iterative adaptive dynamicprogramming (ADP) algorithm, an optimal control scheme for a class of unknown discrete-time nonlinear systems with discount factor in the cost function is proposed in this paper. The optimal controller is designed with convergence analysis in terms of cost function and control law. In order to implement the algorithm via globalized dual heuristic programming (CDHP) technique, a neural network is constructed first to identify the unknown nonlinear system, and then two other neural networks are used to approximate the cost function and the control law, respectively. An example is provided to verify the effectiveness of the present approach.
In order to improve the performance of nonlinear model predictive control (NMPC) in the presence of disturbances or model uncertainties, an approximate dynamic programming (ADP) control scheme is proposed. Namely, the...
详细信息
ISBN:
(纸本)9788993215038
In order to improve the performance of nonlinear model predictive control (NMPC) in the presence of disturbances or model uncertainties, an approximate dynamic programming (ADP) control scheme is proposed. Namely, the Bellman's optimality principle is employed to determine the input based on the approximate value function constructed from the historical operation data. In addition, the support vector data description is also applied in the state space to determine if the ADP control is suitable for the current state. The proposed control strategy is illustrated on a CSTR example to show its effectiveness.
Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamic prog...
详细信息
Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamicprogramming are the standard methods to solve MDPs, their unwieldy computational requirements limit their usefulness in real applications. approximate dynamic programming (ADP) combines simulation and function approximation to alleviate the 'curse-of-dimensionality' associated with the traditional dynamicprogramming approach. In this paper, we present the ADP as a viable way to solve MDPs for process control and scheduling problems. We bring forth some key issues for its successful application in these types of problems, including the choice of function approximator and the use of a penalty function to guard against over-extending the value function approximation in the value iteration. Application studies involving a number of well-known control and scheduling problems, including dual control, multiple controller scheduling, and resource constrained project scheduling problems, point to the promising potentials of ADP. (c) 2006 Elsevier Ltd. All rights reserved.
We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations...
详细信息
We propose two approximate dynamic programming methods to optimize the distribution operations of a company manufacturing a certain product at multiple production plants and shipping it to different customer locations for sale. We begin by formulating the problem as a dynamic program. Our first approximate dynamic programming method uses a linear approximation of the value function and computes the parameters of this approximation by using the linear programming representation of the dynamic program. Our second method relaxes the constraints that link the decisions for different production plants. Consequently, the dynamic program decomposes by the production plants. Computational experiments show that the proposed methods are computationally attractive, and in particular, the second method performs significantly better than standard benchmarks. (C) 2006 Wiley Periodicals, Inc.
In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm tha...
详细信息
In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm that forces a gradual change in policy update. This allows us to prove finite-iteration and asymptotic l∞-norm performance-loss bounds in the presence of approximation/ estimation error which depend on the average accumulated error as opposed to the standard bounds which are expressed in terms of the supremum of the errors. The dependency on the average error is important in problems with limited number of samples per iteration, for which the average of the errors can be significantly smaller in size than the supremum of the errors. Based on these theoretical results, we prove that a sampling-based variant of DPP (DPP-RL) asymptotically converges to the optimal policy. Finally, we illustrate numerically the applicability of these results on some benchmark problems and compare the performance of the approximate variants of DPP with some existing reinforcement learning (RL) methods.
There are a number of sources of randomness that arise in military airlift operations. However, the cost of uncertainty can be difficult to estimate, and is easy to overestimate if we use simplistic decision rules. Us...
详细信息
There are a number of sources of randomness that arise in military airlift operations. However, the cost of uncertainty can be difficult to estimate, and is easy to overestimate if we use simplistic decision rules. Using data from Canadian military airlift operations, we study the effect of uncertainty in customer demands as well as aircraft failures, on the overall cost. The system is first analyzed using the types of myopic decision rules widely used in the research literature. The performance of the myopic policy is then compared to the results obtained using robust decisions that account for the uncertainty of future events. These are obtained by modeling the problem as a dynamic program, and solving Bellman's equations using approximate dynamic programming. The experiments show that even approximate solutions to Bellman's equations produce decisions that reduce the cost of uncertainty.
dynamicprogramming is an effective optimal control method for multi-stage decision-making ***,it can't be used to solve complex issues due to the problem of curse of *** analyzing the problem of dynamic programmi...
详细信息
dynamicprogramming is an effective optimal control method for multi-stage decision-making ***,it can't be used to solve complex issues due to the problem of curse of *** analyzing the problem of dynamicprogramming,this article elaborates the theory and method of approximate dynamic programming solving this problem in *** second-order training algorithm is also given to improve the convergence performance of iteration and stability performance of ***,this method was applied in the speed fluctuation control at idle for a four-cylinder diesel engine to verify its correctness and *** illustrated for engine,this control system framework should also be applicable to general purpose nonlinear system,and it doesn't need the model of the controlled object.
We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear com...
详细信息
We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear combination of basis functions. We establish a bound on the performance of the resulting policy that scales gracefully with the number of states without imposing the strong Lyapunov condition required by its counterpart in de Farias and Van Roy (de Farias, D. R, B. Van Roy. 2003. The linear programming approach to approximate dynamic programming. Oper Res. 51(6) 850-865]. We investigate implications of this result in the context of a queueing control problem.
This paper deals with the flnite-horizon optimal tracking control for a class of discrete-time nonlinear systems using the iterative adaptive dynamicprogramming(ADP) ***,the optimal tracking problem is converted into...
详细信息
This paper deals with the flnite-horizon optimal tracking control for a class of discrete-time nonlinear systems using the iterative adaptive dynamicprogramming(ADP) ***,the optimal tracking problem is converted into designing a flnite-horizon optimal regulator for the tracking error ***,with convergence analysis in terms of cost function and control law,the iterative ADP algorithm via heuristic dynamicprogramming(HDP) technique is introduced to obtain the flnite-horizon optimal tracking controller which makes the cost function close to its optimal value within an e-error ***, three neural networks are used to implement the algorithm,which aims at approximating the cost function,the control law,and the error dynamics,*** last,an example is included to demonstrate the effectiveness of the proposed approach.
This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation...
详细信息
This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the 'curse-of-dimensionality' of the traditional dynamicprogramming approach. In ADP, one fits a function approximator to state vs. 'cost-to-go' data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose. (c) 2005 Elsevier Ltd. All rights reserved.
暂无评论