Undiscounted Markov decision processes (UMDP's) can formulate optimal stochastic control problems that minimize the expected total cost per period for various systems. We propose new approximatedynamic programmin...
详细信息
Undiscounted Markov decision processes (UMDP's) can formulate optimal stochastic control problems that minimize the expected total cost per period for various systems. We propose new approximatedynamicprogramming (ADP) algorithms for large-scale UMDP's that can solve the curses of dimensionality. These algorithms, called simulation-based modified policy iteration (SBMPI) algorithms, are extensions of the simulation-based modified policy iteration method (SBMPIM) (Ohno, 2011) for optimal control problems of multistage JIT-based production and distribution systems with stochastic demand and production capacity. The main new concepts of the SBMPI algorithms are that the simulation-based policy evaluation step of the SBMPIM is replaced by the partial policy evaluation step of the modified policy iteration method (MPIM) and that the algorithms starts from the expected total cost per period and relative value estimated by simulating the system under a reasonable initial policy. For numerical comparisons, the optimal control problem of the three-stage JIT-based production and distribution system with stochastic demand and production capacity is formulated as a UMDP. The demand distribution is changed from a shifted binomial distribution in Ohno (2011) to a Poisson distribution and near-optimal policies of the optimal control problems with 35,973,840 states are computed by the SBMPI algorithms and the SBMPIM. The computational result shows that the SBMPI algorithms are at least 100 times faster than the SBMPIM in solving the numerical problems and are robust with respect to initial policies. Numerical examples are solved to show an effectiveness of the near optimal control utilizing the SBMPI algorithms compared with optimized pull systems with optimal parameters computed utilizing the SBOS (simulation-based optimal solutions) from Ohno (2011). (C) 2015 Elsevier B.V. and Association of European Operational Research Societies (EURO) within the International Federation of O
We consider a rental service with a fixed number of rental units distributed across multiple locations. The units are accessed by customers without prior reservation and on an on-demand basis. Customers can decide on ...
详细信息
We consider a rental service with a fixed number of rental units distributed across multiple locations. The units are accessed by customers without prior reservation and on an on-demand basis. Customers can decide on how long to keep a unit and where to return it. Because of the randomness in demand and in returns, there is a need to periodically reposition inventory away from some locations and into others. In deciding on how much inventory to reposition and where, the system manager balances potential lost sales with repositioning costs. Although the problem is increasingly common in applications involving on-demand rental services, not much is known about the nature of the optimal policy for systems with a general network structure or about effective approaches to solving the problem. In this paper, first, we show that the optimal policy in each period can be described in terms of a well-specified region over the state space. Within this region, it is optimal not to reposition any inventory, whereas, outside the region, it is optimal to reposition but only such that the system moves to a new state that is on the boundary of the no-repositioning region. We also provide a simple check for when a state is in the no-repositioning region. Second, we leverage the features of the optimal policy, along with properties of the optimal cost function, to propose a provably convergent approximatedynamicprogramming algorithm to tackle problems with a large number of dimensions.
暂无评论