In this paper, an approximate dynamic programming (ADP) based strategy is applied to the dual adaptive control problem. The ADP strategy provides a computationally amenable way to build a significantly improved policy...
详细信息
In this paper, an approximate dynamic programming (ADP) based strategy is applied to the dual adaptive control problem. The ADP strategy provides a computationally amenable way to build a significantly improved policy by solving dynamicprogramming on only those points of the hyper-state space sampled during closed-loop Monte Carlo simulations performed under known suboptimal control policies. The potentials of the ADP approach for generating a significantly improved policy are illustrated on an ARX process with unknown/varying parameters. (C) 2009 Elsevier Ltd. All rights reserved.
This paper examines approximate dynamic programming algorithms for the single-vehicle routing problem with stochastic demands from a dynamic or reoptimization perspective. The methods extend the rollout algorithm by i...
详细信息
This paper examines approximate dynamic programming algorithms for the single-vehicle routing problem with stochastic demands from a dynamic or reoptimization perspective. The methods extend the rollout algorithm by implementing different base sequences (i.e. a priori solutions), look-ahead policies, and pruning schemes. The paper also considers computing the cost-to-go with Monte Carlo simulation in addition to direct approaches. The best new method found is a two-step lookahead rollout started with a stochastic base sequence. The routing cost is about 4.8% less than the one-step rollout algorithm started with a deterministic sequence. Results also show that Monte Carlo cost-to-go estimation reduces computation time 65% in large instances with little or no loss in solution quality. Moreover, the paper compares results to the perfect information case from solving exact a posteriori solutions for sampled vehicle routing problems. The confidence interval for the overall mean difference is (3.56%, 4.11%). (C) 2008 Elsevier B.V. All rights reserved.
We consider a multistage asset acquisition problem where assets are purchased now, at a price that varies randomly over time, to be used to satisfy a random demand at a particular point in time in the future. We provi...
详细信息
We consider a multistage asset acquisition problem where assets are purchased now, at a price that varies randomly over time, to be used to satisfy a random demand at a particular point in time in the future. We provide a rare proof of convergence for an approximate dynamic programming algorithm using pure exploitation, where the states we visit depend on the decisions produced by solving the approximate problem. The resulting algorithm does not require knowing the probability distribution of prices or demands, nor does it require any assumptions about its functional form. The algorithm and its proof rely on the fact that the true value function is a family of piecewise linear concave functions.
In this paper, a novel iterative adaptive dynamicprogramming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-ti...
详细信息
In this paper, a novel iterative adaptive dynamicprogramming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
We model a multiperiod, single resource capacity reservation problem as a dynamic, stochastic, multiple knapsack problem with stochastic dynamicprogramming. As the state space grows exponentially in the number of kna...
详细信息
We model a multiperiod, single resource capacity reservation problem as a dynamic, stochastic, multiple knapsack problem with stochastic dynamicprogramming. As the state space grows exponentially in the number of knapsacks and the decision set grows exponentially in the number of order arrivals per period, the recursion is computationally intractable for large-scale problems, including those with long horizons. Our goal is to ensure optimal, or near optimal, decisions at time zero when maximizing the net present value of returns from accepted orders, but solving problems with short horizons introduces end-of-study effects which may prohibit finding good solutions at time zero. Thus, we propose an approximation approach which utilizes simulation and deterministic dynamicprogramming in order to allow for the solution of longer horizon problems and ensure good time zero decisions. Our computational results illustrate the effectiveness of the approximation scheme.
Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamic prog...
详细信息
ISBN:
(纸本)9780769537528
Multi-stage decision problems under uncertainty are abundant in process industries. Markov decision process (MDP) is a general mathematical formulation of such problems. Whereas stochastic programming and dynamicprogramming are the standard methods to solve MDPs, their unwieldy computational requirements limit their usefulness in real applications. approximate dynamic programming (ADP) combines simulation and function approximation to alleviate the "curse-of-dimensionality" associated with the traditional dynamicprogramming approach. In this paper, the method of ADP, which abates the curse-of-dimensionality by solving the DP within a carefully chosen, small subset of the state space, was introduced;a survey of recent research directions within the field of ADP had been made.
As an important class of approximate dynamic programming, the direct heuristic dynamicprogramming (DHDP) is discussed in this *** performs well due to its model-free online learning *** the classical DHDP is implemen...
详细信息
As an important class of approximate dynamic programming, the direct heuristic dynamicprogramming (DHDP) is discussed in this *** performs well due to its model-free online learning *** the classical DHDP is implemented with gradient-based adaptation learning algorithm of neural network, in this paper we present a design strategy of DHDP with a novel hybrid estimation of distribution algorithm for online learning and control, and the proposed design optimization method achieves the weight training of neural networks with faster convergence *** proposed approach can be viewed as an improvement for *** simulation is conducted on a practical system plant to test the online learning performance by using our ***, the simulation results show the effectiveness of our approach.
In this paper, a novel online approximate dynamic programming (ADP) technique for completely unknown continuous-time linear systems is proposed to solve the infinite horizon linear quadratic (LQ) optimal control probl...
详细信息
ISBN:
(纸本)9781424438723
In this paper, a novel online approximate dynamic programming (ADP) technique for completely unknown continuous-time linear systems is proposed to solve the infinite horizon linear quadratic (LQ) optimal control problems. For relaxing the assumption of the known input coupling matrix, the conventional LQ optimal control problem is converted into the proposed cheap control problem. Then, the ADP agent iteratively solves this cheap optimal control problem in online fashion to obtain the near-optimal solution of the conventional LQ optimal control problem. In addition, we mathematically prove the approximation property of the cheap optimal control problem with respect to the conventional LQ optimal control problem. The numerical simulation for ideal DC motor shows the applicability of the proposed ADP algorithm.
The research objective of the dissertation is to develop methods to address the curse of dimensionality in the field of approximate dynamic programming, to enhance the scalability of these methods to large-scale probl...
详细信息
The research objective of the dissertation is to develop methods to address the curse of dimensionality in the field of approximate dynamic programming, to enhance the scalability of these methods to large-scale problems. Several problems, including those faced in day to day life involve sequential decision making in the presence of uncertainty. These problems can often be modeled as Markov decision processes using the Bellman's optimality equation. Attempts to solve even reasonably complex problems through stochastic dynamicprogramming are faced with the curse of modeling and the curse of dimensionality. The curse of modeling has been addressed in the literature through the introduction of reinforcement learning strategies, a strand of approximate dynamic programming (ADP). In spite of considerable research efforts, curse of dimensionality which affects the scalability of ADP for large scale applications still remains a challenge. In this research, a value function approximation method based on the theory of diffusion wavelets is investigated to address the scalability of ADP methods. The first contribution of this dissertation is an advancement of the state-of-the-art in the field of stochastic dynamicprogramming methods that are solved using ADP approaches. An important intellectual merit is the innovatively designed diffusion wavelet based value function approximation method which is integrated with ADP to address the curse of dimensionality. The innovation lies in this integration that exploits the structure of the problem to achieve computational feasibility. The ADP method with diffusion wavelet based value function approximation is tested on the problem of taxi-out time estimation of aircrafts (time duration between gate-pushback and wheels-off) to establish a proof of concept for the research objective. The second contribution of this dissertation is the modeling of the taxi-out time estimation of flights as a stochastic dynamicprogramming problem with t
This paper proposes a novel finite-time optimal control method based on input-output data for unknown nonlinear systems using adaptive dynamicprogramming (ADP) algorithm. In this method, the single-hidden layer feed-...
详细信息
This paper proposes a novel finite-time optimal control method based on input-output data for unknown nonlinear systems using adaptive dynamicprogramming (ADP) algorithm. In this method, the single-hidden layer feed-forward network (SLFN) with extreme learning machine (ELM) is used to construct the data-based identifier of the unknown system dynamics. Based on the data-based identifier, the finite-time optimal control method is established by ADP algorithm. Two other SLFNs with ELM are used in ADP method to facilitate the implementation of the iterative algorithm, which aim to approximate the performance index function and the optimal control law at each iteration, respectively. A simulation example is provided to demonstrate the effectiveness of the proposed control scheme.
暂无评论