Deterministic optimization has enjoyed a rich place in transportation and logistics, where it represents a mature field with established modeling and algorithmic strategies. By contrast, sequential stochastic optimiza...
详细信息
Deterministic optimization has enjoyed a rich place in transportation and logistics, where it represents a mature field with established modeling and algorithmic strategies. By contrast, sequential stochastic optimization models (dynamic programs) have been plagued by the lack of a common modeling framework, and by algorithmic strategies that just do not seem to scale to real-world problems in transportation. This paper is designed as a tutorial of the modeling and algorithmic framework of approximate dynamic programming;however, our perspective on approximate dynamic programming is relatively new, and the approach is new to the transportation research community. We present a simple yet precise modeling framework that makes it possible to integrate most algorithmic strategies into four fundamental classes of policies, the design of which represents approximate solutions to these dynamic programs. The paper then uses problems in transportation and logistics to indicate settings in which each of the four classes of policies represents a natural solution strategy, highlighting the fact that the design of effective policies for these complex problems will remain an exciting area of research for many years. Along the way, we provide a link between dynamicprogramming, stochastic programming and stochastic search.
We model a multiperiod, single resource capacity reservation problem as a dynamic, stochastic, multiple knapsack problem with stochastic dynamicprogramming. As the state space grows exponentially in the number of kna...
详细信息
We model a multiperiod, single resource capacity reservation problem as a dynamic, stochastic, multiple knapsack problem with stochastic dynamicprogramming. As the state space grows exponentially in the number of knapsacks and the decision set grows exponentially in the number of order arrivals per period, the recursion is computationally intractable for large-scale problems, including those with long horizons. Our goal is to ensure optimal, or near optimal, decisions at time zero when maximizing the net present value of returns from accepted orders, but solving problems with short horizons introduces end-of-study effects which may prohibit finding good solutions at time zero. Thus, we propose an approximation approach which utilizes simulation and deterministic dynamicprogramming in order to allow for the solution of longer horizon problems and ensure good time zero decisions. Our computational results illustrate the effectiveness of the approximation scheme.
A renewable power producer who trades on a day-ahead market sells electricity under supply and price uncertainty. Investments in energy storage mitigate the associated financial risks and allow for decoupling the timi...
详细信息
A renewable power producer who trades on a day-ahead market sells electricity under supply and price uncertainty. Investments in energy storage mitigate the associated financial risks and allow for decoupling the timing of supply and delivery. This paper introduces a model of the optimal bidding strategy for a hybrid system of renewable power generation and energy storage. We formulate the problem as a continuous-state Markov decision process and present a solution based on approximate dynamic programming. We propose an algorithm that combines approximate policy iteration with Least Squares Policy Evaluation (LSPE) which is used to estimate the weights of a polynomial value function approximation. We find that the approximate policies produce significantly better results for the continuous state space than an optimal discrete policy obtained by linear programming. A numerical analysis of the response surface of rewards on model parameters reveals that supply uncertainty, imbalance costs and a negative correlation of market price and supplies are the main drivers for investments in energy storage. Supply and price autocorrelation, on the other hand, have a negative effect on the value of storage.
Three novel approximate dynamic programming algorithms based on the temporal, spatial, and spatiotemporal decomposition are proposed for the economic dispatch problem (EDP) in a distribution energy system with complex...
详细信息
Three novel approximate dynamic programming algorithms based on the temporal, spatial, and spatiotemporal decomposition are proposed for the economic dispatch problem (EDP) in a distribution energy system with complex topology and many non-dispatchable renewable energy sources and energy storage systems (ESS). Computational efficiency of the proposed algorithms is compared and convergence to the optimal solution is shown in numeric experiments on the example of the two-day hourly EDP for the IEEE 33bw test network having 200+ consumers, 150+ energy storages, and 1000+ consuming devices.
We consider an approximate dynamic programming heuristic to support the selection of defense projects when projects have different values and are originated intermittently but fairly frequently. We show that a simple ...
详细信息
We consider an approximate dynamic programming heuristic to support the selection of defense projects when projects have different values and are originated intermittently but fairly frequently. We show that a simple policy reserving a positive fraction of the available budget for high-value projects not yet originated is superior to a greedy knapsack approach.
In this study, we develop a stochastic dynamic traffic-flow model subject to practical restrictions under the non-homogeneous Poisson vehicle arrival process. Using the cell transmission strategy, we establish traffic...
详细信息
In this study, we develop a stochastic dynamic traffic-flow model subject to practical restrictions under the non-homogeneous Poisson vehicle arrival process. Using the cell transmission strategy, we establish traffic dynamics between two intersections. We also discuss simulating the random demand of source links from an estimated intensity function. Additionally, we propose an algorithm to optimize time interval division for aggregated data, aiming to enhance estimation performance. We explore applying our traffic flow model to the adaptive traffic network management problem, which is formulated as a Markov decision process. Leveraging approximate dynamic programming with recursive least squares-temporal difference learning, we achieve adaptive optimal policies. To validate our approach, we conduct a series of numerical experiments with random demands. The results of non-homogeneous Poisson demand conducted using random numbers and a real-word dataset indicate high efficiency with the piecewise constant, I-SMOOTH, and MNO-PQRS estimators. Compared to the Webster and Max-pressure control systems, our proposed approximate dynamic programming-based model exhibits superior stability and applicability.
This research investigates the optimal control problem of heavy haul train for the minimization of longitudinal forces. As the heavy haul train is much heavier and longer than ordinary train, the in-train forces shoul...
详细信息
This research investigates the optimal control problem of heavy haul train for the minimization of longitudinal forces. As the heavy haul train is much heavier and longer than ordinary train, the in-train forces should be carefully manipulated to reduce the train's maintenance cost and, most importantly, to ensure operation safety. Specifically, the limitations of pneumatically controlled braking system increase the need for the optimal control strategy to accounting for future grades, speed restrictions and uncertain disturbances. In this article, the stochastic dynamicprogramming model is adopt to set up a rigorous mathematical formulation for heavy haul train control, and approximate dynamic programming algorithm with lookup table representation is introduced to find the optimal solution of the considered problem. By handling the existed uncertainties in a mathematical way, the post-decision state variable is utilized to represent the state of the heavy haul train after we have made a control decision but before any exogenous information has arrived. Finally, the computational results demonstrate the effectiveness and performance of the proposed model and algorithm.
approximate dynamic programming is an effective optimal control method. This article researches a data-driven approximate dynamic programming. The method is extended to a nonlinear multi-input multi-output form. Using...
详细信息
approximate dynamic programming is an effective optimal control method. This article researches a data-driven approximate dynamic programming. The method is extended to a nonlinear multi-input multi-output form. Using the data from a unique 4JB1-Tweifu accumulator pump system (WAPS) engine, the developed approximate dynamic programming controller is trained to achieve its optimal trade-off emission control between the nitrogen oxides and particulate matter. The convergent proof of this method is given. The second-order training algorithm is introduced to promote the robust and convergent performance. The control objective is to let the WAPS engine pass the China State-IV emission test under the New European Drive Cycle. The bench test shows that an excellent control transient performance and significant promotion have been achieved. This article presents a new approach for the engine control and calibration. In addition, it also adds another dimension to the existing literature on the data-driven nonlinear multi-input multi-output trade-off emission control of the WAPS engine.
In this paper, a new algorithm for realization of approximate dynamic programming (ADP) with Gaussian processes (GPs) for continuous-time (CT) nonlinear input-affine systems is proposed to infinite horizon optimal con...
详细信息
In this paper, a new algorithm for realization of approximate dynamic programming (ADP) with Gaussian processes (GPs) for continuous-time (CT) nonlinear input-affine systems is proposed to infinite horizon optimal control problems. The convergence for the ADP algorithm is proven based on the assumption of an exact approximation, where both the cost function and the control input converge to their optimal values, that is, the solution to the Hamilton-Jacobi-Bellman (HJB) equation. The approximation errors, however, are unavoidable in almost every case of applications. In order to tackle the problem, the proposed algorithm is derived with the proof of convergence, where the cost function and the control input, which are both approximated, converge to those of the ADP as the number of data points for GPs approaches infinity. A numerical simulation demonstrates the effectiveness of the proposed algorithm.
Inspired by the problem of best managing the invasive mosquito Aedes albopictus across the 17 Torres Straits islands of Australia, we aim at solving a Markov decision process on large Susceptible-Infected-Susceptible ...
详细信息
ISBN:
(纸本)9781450358163
Inspired by the problem of best managing the invasive mosquito Aedes albopictus across the 17 Torres Straits islands of Australia, we aim at solving a Markov decision process on large Susceptible-Infected-Susceptible (SIS) networks that are highly connected. While dynamicprogramming approaches can solve sequential decision-making problems on sparsely connected networks, these approaches are intractable for highly connected networks. Inspired by our case study, we focus on problems where the probability of nodes changing state is low and propose two approximate dynamic programming approaches. The first approach is a modified version of value iteration where only those future states that are similar to the current state are accounted for. The second approach models the state space as continuous instead of binary, with an on-line algorithm that takes advantage of Bellman's adapted equation. We evaluate the resulting policies through simulations and provide a priority order to manage the 17 infested Torres Strait islands. Both algorithms show promise, with the continuous state approach being able to scale up to high dimensionality (50 nodes). This work provides a successful example of how AI algorithms can be designed to tackle challenging computational sustainability problems.
暂无评论