In this study, we illustrate a real-time approximate dynamic programming (RTADP) method for solving multistage capacity decision problems in a stochastic manufacturing environment, by using an exemplary three-stage ma...
详细信息
In this study, we illustrate a real-time approximate dynamic programming (RTADP) method for solving multistage capacity decision problems in a stochastic manufacturing environment, by using an exemplary three-stage manufacturing system with recycle. The system is a moderate size queuing network, which experiences stochastic variations in demand and product yield. The dynamic capacity decision problem is formulated as a Markov decision process (MDP). The proposed RTADP method starts with a set of heuristics and learns a superior quality solution by interacting with the stochastic system via simulation. The curse-of-dimensionality associated with DP methods is alleviated by the adoption of several notions including "evolving set of relevant states," for which the value function table is built and updated, "adaptive action set" for keeping track of attractive action candidates, and "nonparametric k nearest neighbor averager" for value function approximation. The performance of the learned solution is evaluated against (1) an "ideal" Solution derived using a mixed integer programming (MIP) formulation, which assumes full knowledge of future realized values of the stochastic variables (2) a myopic heuristic solution, and (3) a sample path based rolling horizon MIP solution. The policy learned through the RTADP method turned out to be superior to polices of 2 and 3. (C) 2010 Wiley Periodicals, Inc. Naval Research Logistics 57: 211-224, 2010
This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain e...
详细信息
This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain environment. For adaptive selection of appropriate ESMs, we generalize an approximate dynamic programming(ADP) framework to the dynamic case. We define the environment model and agent model, respectively. To handle the partially observable challenge, we apply the unsented Kalman filter(UKF) algorithm for belief state estimation. To reduce the computational burden, a simulation-based approach rollout with a redesigned base policy is proposed to approximate the long-term cumulative reward. Meanwhile, Monte Carlo sampling is combined into the rollout to estimate the expectation of the rewards. The experiments indicate that our method outperforms other strategies due to its better performance in larger-scale problems.
Through smartphone apps, drivers and passengers can dynamically enter and leave ride-hailing platforms. As a result, ride-pooling is challenging due to complex system dynamics and different objectives of multiple stak...
详细信息
Through smartphone apps, drivers and passengers can dynamically enter and leave ride-hailing platforms. As a result, ride-pooling is challenging due to complex system dynamics and different objectives of multiple stakeholders. In this paper, we study ride-pooling with no more than two passenger groups who can share rides in the same vehicle. We dynamically match available drivers to randomly arriving passengers and also decide pick-up and drop-off routes. The goal is to minimize a weighted sum of passengers' waiting time and trip delay time. A spatial-and-temporal decomposition heuristic is applied and each subproblem is solved using approximate dynamic programming (ADP), for which we show properties of the approximated value function at each stage. Our model is benchmarked with the one that optimizes vehicle dispatch without ride-pooling and the one that matches current drivers and passengers without demand forecasting. Using test instances generated based on the New York City taxi data during one peak hour, we conduct computational studies and sensitivity analysis to show (i) empirical convergence of ADP, (ii) benefit of ride-pooling, and (iii) value of future supply-demand information.
An approximate dynamic programming algorithm based on an actor?critic framework is proposed in this article. This algorithm takes the actor?critic framework as the basic framework, in which the actor and the critic ar...
详细信息
An approximate dynamic programming algorithm based on an actor?critic framework is proposed in this article. This algorithm takes the actor?critic framework as the basic framework, in which the actor and the critic are used to approximate the optimal value function and the control strategy, respectively. At first, the linear basis function approximator is used to approximate the value function. Then the method of basis function construction based on system characteristics is introduced. Furthermore, since the injection concentration of ASP flooding has a fixed interval, the action weighting method is adopted to restrict and approximate the optimal control action. The value function parameter and the two strategy parameters are updated by the gradient descent method. Meanwhile the eligibility trace is introduced to accelerate convergence. Finally, ASP flooding with four injection wells and nine production wells is used to test the effect of the proposed method.
We consider the problem of temporal fair scheduling of queued data transmissions in wireless heterogeneous networks. We deal with both the throughput maximization problem and the delay minimization problem. Taking fai...
详细信息
We consider the problem of temporal fair scheduling of queued data transmissions in wireless heterogeneous networks. We deal with both the throughput maximization problem and the delay minimization problem. Taking fairness constraints and the data arrival queues into consideration, we formulate the transmission scheduling problem as a Markov decision process (MDP) with fairness constraints. We study two categories of fairness constraints, namely temporal fairness and utilitarian fairness. We consider two criteria: infinite horizon expected total discounted reward and expected average reward. Applying the dynamicprogramming approach, we derive and prove explicit optimality equations for the above constrained MDPs, and give corresponding optimal fair scheduling policies based on those equations. A practical stochastic-approximation-type algorithm is applied to calculate the control parameters online in the policies. Furthermore, we develop a novel approximation method-temporal fair rollout-to achieve a tractable computation. Numerical results show that the proposed scheme achieves significant performance improvement for both throughput maximization and delay minimization problems compared with other existing schemes.
Y Optimal control and reinforcement learning have an associate "value function" which must be suitably approximated. Value function approximation problems usually have different precision requirements in dif...
详细信息
Y Optimal control and reinforcement learning have an associate "value function" which must be suitably approximated. Value function approximation problems usually have different precision requirements in different regions of the state space. An uniform gridding wastes resources in regions in which the value function is smooth, and, on the other hand, has not enough resolution in zones with abrupt changes. The present work proposes an adaptive meshing methodology in order to adapt to these changing requirements without incrementing too much the number of parameters of the approximator. The proposal is based on simplicial meshes and Bellman error, with a criteria to add and remove points from the mesh: modifications to proposals in earlier literature including the volume of the affected simplices are proposed, alongside with methods to manipulate the mesh triangulation.
In this article, online collaborative content caching in wireless networks is studied from a network economics point of view. The cache optimization problem is first modelled as a finite horizon Markov decision proces...
详细信息
In this article, online collaborative content caching in wireless networks is studied from a network economics point of view. The cache optimization problem is first modelled as a finite horizon Markov decision process that incorporates an auto-regressive model to forecast the evolution of the content demands. The complexity of the problem grows exponentially with the system parameters, and even though a good approximation to the cost-to-go can be found, the single-stage decision problem is still NP-hard. To deal with cache optimization in industrial-size networks, a novel methodology called rolling horizon is proposed that solves the dimensionality of the problem by freezing the cache decisions for a short number of periods to construct a value function approximation. Then, to address the NP-hardness of the single-stage decision problem, two simplifications/reformulations are examined: (a) to limit the number of content replicas in the network and (b) to limit the allowed content replacements. The results show that the proposed approach can reduce the communication cost by over 84% compared to that of running least recently used updates on offline schemes in collaborative caching. The results also shed light on the trade-off between the efficiency of the caching policy and the time needed to run the online cache optimization algorithm.
We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear com...
详细信息
We introduce a new algorithm based on linear programming for optimization of average-cost Markov decision processes (MDPs). The algorithm approximates the differential cost function of a perturbed MDP via a linear combination of basis functions. We establish a bound on the performance of the resulting policy that scales gracefully with the number of states without imposing the strong Lyapunov condition required by its counterpart in de Farias and Van Roy (de Farias, D. R, B. Van Roy. 2003. The linear programming approach to approximate dynamic programming. Oper Res. 51(6) 850-865]. We investigate implications of this result in the context of a queueing control problem.
approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical sol...
详细信息
approximate dynamic programming (ADP) relies, in the continuous-state case, on both a flexible class of models for the approximation of the value functions and a smart sampling of the state space for the numerical solution of the recursive Bellman equations. In this paper, low-discrepancy sequences, commonly employed for number-theoretic methods, are investigated as a sampling scheme in the ADP context when local models, such as the Nadaraya Watson (NW) ones, are employed for the approximation of the value function. The analysis is carried out both from a theoretical and a practical point of view. In particular, it is shown that the combined use of low-discrepancy sequences and NW models enables the convergence of the ADP procedure. Then, the regular structure of the low-discrepancy sampling is exploited to derive a method for automatic selection of the bandwidth of NW models, which yields a significant saving in the computational effort with respect to the standard cross validation approach. Simulation results concerning an inventory management problem are presented to show the effectiveness of the proposed techniques. (C) 2013 Elsevier Ltd. All rights reserved.
In this paper, we propose new integer optimization models for the lot-sizing and scheduling problem with sequence-dependent setups, based on the general lot-sizing and scheduling problem. To incorporate setup crossove...
详细信息
In this paper, we propose new integer optimization models for the lot-sizing and scheduling problem with sequence-dependent setups, based on the general lot-sizing and scheduling problem. To incorporate setup crossover and carryover, we first propose a standard model that straightforwardly adapts a formulation technique from the literature. Then, as the main contribution, we propose a novel optimization model that incorporates the notion of time flow. We derive a family of valid inequalities with which to compare the tightness of the models' linear programming relaxations. In addition, we provide an approximate dynamic programming algorithm that estimates the value of a state using its lower and upper bounds. Then, we conduct computational experiments to demonstrate the competitiveness of the proposed models and the solution algorithm. The test results show that the newly proposed time-flow model has considerable advantages compared with the standard model in terms of tightness and solvability. The proposed algorithm also shows computational benefits over the standard mixed integer programming solver.
暂无评论