Although increasing amounts of transaction data make it possible to characterize uncertainties surrounding customer service requests, few methods integrate predictive tools with prescriptive optimization procedures to...
详细信息
Although increasing amounts of transaction data make it possible to characterize uncertainties surrounding customer service requests, few methods integrate predictive tools with prescriptive optimization procedures to meet growing demand for small-volume urban transport services. We incorporate temporal and spatial anticipation of service requests into approximate dynamic programming (ADP) procedures to yield dynamic routing policies for the single-vehicle routing problem with stochastic service requests, an important problem in city-based logistics. We contribute to the routing literature as well as to the field of ADP. We combine offline value function approximation (VFA) with online rollout algorithms resulting in a high-quality, computationally tractable policy. Our offline-online policy enhances the anticipation of the VFA policy, yielding spatial and temporal anticipation of requests and routing developments. Our combination of VFA with rollout algorithms demonstrates the potential benefit of using offline and online methods in tandem as a hybrid ADP procedure, making possible higher-quality policies with reduced computational requirements for real-time decision making. Finally, we identify a policy improvement guarantee applicable to VFA-based rollout algorithms, showing that base policies composed of deterministic decision rules lead to rollout policies with performance at least as strong as that of their base policy.
作者:
Bikker, Ingeborg A.Mes, Martijn R. K.Saure, AntoineBoucherie, Richard J.Univ Twente
Ctr Healthcare Operat Improvement & Res CHOIR Drienerlolaan 5 NL-7500 AE Enschede Netherlands Sint Maartensklin
Dept Healthcare Logist Hengstdal 3 NL-6574 NA Nijmegen Netherlands Univ Twente
Dept Appl Math Stochast Operat Res Drienerlolaan 5 NL-7500 AE Enschede Netherlands Univ Twente
Dept Ind Engn & Business Informat Syst IEBIS Drienerlolaan 5 NL-7500 AE Enschede Netherlands Univ Ottawa
Telfer Sch Management 55 Laurier Ave East Ottawa ON K1N 6N5 Canada
We study an online capacity planning problem in which arriving patients require a series of appointments at several departments, within a certain access time target. This research is motivated by a study of rehabilita...
详细信息
We study an online capacity planning problem in which arriving patients require a series of appointments at several departments, within a certain access time target. This research is motivated by a study of rehabilitation planning practices at the Sint Maartenskliniek hospital (the Netherlands). In practice, the prescribed treatments and activities are typically booked starting in the first available week, leaving no space for urgent patients who require a series of appointments at a short notice. This leads to the rescheduling of appointments or long access times for urgent patients, which has a negative effect on the quality of care and on patient satisfaction. We propose an approach for allocating capacity to patients at the moment of their arrival, in such a way that the total number of requests booked within their corresponding access time targets is maximized. The model considers online decision making regarding multi-priority, multi-appointment, and multi-resource capacity allocation. We formulate this problem as a Markov decision process (MDP) that takes into account the current patient schedule, and future arrivals. We develop an approximate dynamic programming (ADP) algorithm to obtain approximate optimal capacity allocation policies. We provide insights into the characteristics of the optimal policies and evaluate the performance of the resulting policies using simulation.
The stochastic economic dispatch problem of power system with multiple wind farms and pumped-storage hydro stations is formulated as a specific stochastic dynamicprogramming (DP) model, i.e. stochastic storage model,...
详细信息
The stochastic economic dispatch problem of power system with multiple wind farms and pumped-storage hydro stations is formulated as a specific stochastic dynamicprogramming (DP) model, i.e. stochastic storage model, it is impossible to obtain an accurate solution due to the curse of dimensionality. Based on the approximate DP (ADP) method, the stochastic storage model can be transformed into a series of mixed-integer linear programming (MILP) models by describing the approximate value functions (AVFs) as convex piecewise linear functions in post-decision states. The AVFs are first initialised using the results of the deterministic model under a forecast scenario of wind farm output and then trained by scanning stochastic sampling scenarios consecutively with the successive projective approximation routine algorithm. To obtain a near-optimal day-ahead dispatch scheme, the forecast scenario is substituted into the MILP models expressed by the trained AVFs and is solved forward through each time interval. The network constraints are incorporated by the while-loop detection of critical lines. Test results on an actual provincial power system and the modified IEEE 39-bus system, including the comparison among the ADP, DP, scenario-based and chance-constrained programming methods, demonstrate the feasibility and efficiency of the proposed model and algorithm.
This paper proposes a novel artificial neural network (ANN) based control method for a dc/dc buck converter. The ANN is trained to implement optimal control based on approximate dynamic programming (ADP). Special char...
详细信息
This paper proposes a novel artificial neural network (ANN) based control method for a dc/dc buck converter. The ANN is trained to implement optimal control based on approximate dynamic programming (ADP). Special characteristics of the proposed ANN control include: 1) The inputs to the ANN contain error signals and integrals of the error signals, enabling the ANN to have PI control ability;2) The ANN receives voltage feedback signals from the dc/dc converter, making the combined system equivalent to a recurrent neural network;3) The ANN is trained to minimize a cost function over a long time horizon, making the ANN have a stronger predictive control ability than a conventional predictive controller;4) The ANN is trained offline, preventing the instability of the network caused by weight adjustments of an on-line training algorithm. The ANN performance is evaluated through simulation and hardware experiments and compared with conventional control methods, which shows that the ANN controller has a strong ability to track rapidly changing reference commands, maintain stable output voltage for a variable load, and manage maximum duty-ratio and current constraints properly.
This paper examines approximate dynamic programming algorithms for the single-vehicle routing problem with stochastic demands from a dynamic or reoptimization perspective. The methods extend the rollout algorithm by i...
详细信息
This paper examines approximate dynamic programming algorithms for the single-vehicle routing problem with stochastic demands from a dynamic or reoptimization perspective. The methods extend the rollout algorithm by implementing different base sequences (i.e. a priori solutions), look-ahead policies, and pruning schemes. The paper also considers computing the cost-to-go with Monte Carlo simulation in addition to direct approaches. The best new method found is a two-step lookahead rollout started with a stochastic base sequence. The routing cost is about 4.8% less than the one-step rollout algorithm started with a deterministic sequence. Results also show that Monte Carlo cost-to-go estimation reduces computation time 65% in large instances with little or no loss in solution quality. Moreover, the paper compares results to the perfect information case from solving exact a posteriori solutions for sampled vehicle routing problems. The confidence interval for the overall mean difference is (3.56%, 4.11%). (C) 2008 Elsevier B.V. All rights reserved.
This article proposes an efficient event-based time-space network flow model with side constraints for the crane scheduling problem in a coil warehouse where the crane should carry out a set of coil storage, retrieval...
详细信息
This article proposes an efficient event-based time-space network flow model with side constraints for the crane scheduling problem in a coil warehouse where the crane should carry out a set of coil storage, retrieval and shuffling requests, and determine the sequence of handling these requests as well as the positions to which the coils are moved. The model is formulated based on a graph such that each node represents a location in the warehouse at the end of a specific scheduling stage, and each edge indicates a crane's move between two locations in a stage. Variables reduction strategies are presented to accelerate solving the model. In order to solve large-sized instances of the problem, an exact dynamicprogramming approach based on optimal assignments between coils and positions in a bipartite network with cuts is designed by exploiting the problem structure. Then an approximate dynamic programming (ADP) approach is developed, in which an affine value function approximation is defined as the estimation of crane's traveling time for handling each coil, and updated via iterations by collecting information from the solutions of separate subproblems. Computational results show that the proposed model is tighter and can be solved much more quickly than a traditional model for a reduced crane scheduling problem in the literature and the standard time-space network flow model. Besides, the proposed algorithm can obtain high quality solutions for large-sized instances in a few minutes and is more efficient in solving the problem than a commercial software package. (C) 2017 Elsevier B.V. All rights reserved.
Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the...
详细信息
Stochastic and dynamic vehicle routing problems gain increasing attention in the research community. In these problems, routing plans are dynamically updated based on realizations of stochastic information. Due to the complexity of the corresponding Markov decision processes (MDPs), the calculation of optimal policies for these problems is usually not possible and researchers draw on heuristical methods of approximate dynamic programming (ADP). These methods use simulation to approximate the value of a state and decision in the MDP. The simulations are either conducted offline or online. Offline methods such as value function approximations (VFAs) generally neglect the full detail of the state space due to aggregation. Online methods such as rollout algorithms (RAs) are often not able to capture decision and transition space sufficiently due to runtime limitations. In this paper, we alleviate this tradeoff by combining two methods of ADP, an online RA and an offline VFA in two ways. In addition to the integration of the VFA as a base policy into the online RA to strengthen the RA's simulations, we also limit the RA's simulation horizon, estimating the remaining reward-to-go again via the VFA. For two stochastic dynamic routing problems from the literature, we show how this combination outperforms state-of-the-art solutions while simultaneously reducing the required time for online calculations.
Optimal scheduling in an anti-lock brake system of ground vehicles is performed through approximate dynamic programming for reducing the stopping distance in severe braking. The proposed optimal scheduler explicitly i...
详细信息
Optimal scheduling in an anti-lock brake system of ground vehicles is performed through approximate dynamic programming for reducing the stopping distance in severe braking. The proposed optimal scheduler explicitly incorporates the hybrid nature of the anti-lock brake system and provides a feedback solution with a negligible computational burden in control calculation. To this goal, an iterative scheme, called the value iteration algorithm, is used to derive the infinite horizon solution to the underlying Hamilton-Jacobi-Bellman equation. Performance of the proposed method in control of the brake system is illustrated using both linear-in-parameter neural networks and multi-layer perceptrons. Simulation results demonstrate potentials of the method.
In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs;we call the overall o...
详细信息
In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs;we call the overall objective a dynamic quantile-based risk measure (DQBRM). In particular, we consider optimizing dynamic risk measures where the one-step risk measures are QBRMs, a class of risk measures that includes the popular value at risk (VaR) and the conditional value at risk (CVaR). Although there is considerable theoretical development of risk-averse MDPs in the literature, the computational challenges have not been explored as thoroughly. We propose data-driven and simulation-based approximate dynamic programming (ADP) algorithms to solve the risk-averse sequential decision problem. We address the issue of inefficient sampling for risk applications in simulated settings and present a procedure, based on importance sampling, to direct samples toward the "risky region" as the ADP algorithm progresses. Finally, we show numerical results of our algorithms in the context of an application involving risk-averse bidding for energy storage.
We propose an energy-efficient supervisory control method for the power management of parallel hybrid electric vehicles (HEVs) to improve the fuel economy and reduce exhaust gas emissions. Plug-in HEVs ((P)HEVs) have ...
详细信息
We propose an energy-efficient supervisory control method for the power management of parallel hybrid electric vehicles (HEVs) to improve the fuel economy and reduce exhaust gas emissions. Plug-in HEVs ((P)HEVs) have multiple power sources (e.g., an engine and motor) that should be cooperatively operated to meet the required instantaneous traction power for the desired vehicle speed while satisfying their physical limits. Because the efficiencies of the engine and motor vary with different operating speeds and torques, the main issue of energy-efficient power management is to allocate the power demand among the power sources by achieving maximum power conversion efficiencies and satisfy the operating limits. For an efficient power allocation, an optimal control problem is formulated, and a global solution is found through deterministic dynamicprogramming (DP). Owing to the curse of dimensionality and uncertainties in real driving, DP solutions are not directly applicable in real time. To resolve the limitations of DP, we employ a non-parametric Bayesian function approximation technique using a Gaussian process (GP). The offline DP solutions obtained from a set of real vehicle driving test data were used to learn a state-dependent probabilistic value function through Gaussian process regression. For online implementations, a receding horizon control scheme was applied for the feedback control of the power management. In comparison with the existing charge sustaining strategy and charge depleting and charge sustaining mixed controllers, we recorded fuel efficiency improvements of over 4.8% and 7.3%, respectively, in a mixed urban-suburban route.
暂无评论