检索结果-内蒙古大学图书馆

New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2016年第1期249卷 22-31页

作者： Ohno, Katsuhisa Boh, Toshitaka Nakade, Koichi Tamura, Takayoshi Aichi Inst Technol Fac Business Adm Chikusa Ku Nagoya Aichi 4640044 Japan Corp Information Syst Co Panason Co Sales SCM Grp Mkt & Logist Solut Business Unit Kadoma Osaka 5718686 Japan Nagoya Inst Technol Nagoya Aichi 4668555 Japan

Undiscounted Markov decision processes (UMDP's) can formulate optimal stochastic control problems that minimize the expected total cost per period for various systems. We propose new approximate dynamic programming (ADP) algorithms for large-scale UMDP's that can solve the curses of dimensionality. These algorithms, called simulation-based modified policy iteration (SBMPI) algorithms, are extensions of the simulation-based modified policy iteration method (SBMPIM) (Ohno, 2011) for optimal control problems of multistage JIT-based production and distribution systems with stochastic demand and production capacity. The main new concepts of the SBMPI algorithms are that the simulation-based policy evaluation step of the SBMPIM is replaced by the partial policy evaluation step of the modified policy iteration method (MPIM) and that the algorithms starts from the expected total cost per period and relative value estimated by simulating the system under a reasonable initial policy. For numerical comparisons, the optimal control problem of the three-stage JIT-based production and distribution system with stochastic demand and production capacity is formulated as a UMDP. The demand distribution is changed from a shifted binomial distribution in Ohno (2011) to a Poisson distribution and near-optimal policies of the optimal control problems with 35,973,840 states are computed by the SBMPI algorithms and the SBMPIM. The computational result shows that the SBMPI algorithms are at least 100 times faster than the SBMPIM in solving the numerical problems and are robust with respect to initial policies. Numerical examples are solved to show an effectiveness of the near optimal control utilizing the SBMPI algorithms compared with optimized pull systems with optimal parameters computed utilizing the SBOS (simulation-based optimal solutions) from Ohno (2011). (C) 2015 Elsevier B.V. and Association of European Operational Research Societies (EURO) within the International Federation of O

关键词： approximate dynamic programming algorithms Undiscounted Markov decision processes The curses of dimensionality JIT-based production and distribution system Optimal control

来源：评论

学校读者我要写书评

暂无评论

dynamic Inventory Repositioning in On-Demand Rental Networks

引用

MANAGEMENT SCIENCE 2022年第11期68卷 7861-7878页

作者： Benjaafar, Saif Jiang, Daniel Li, Xiang Li, Xiaobo Univ Minnesota Dept Ind & Syst Engn Minneapolis MN 55455 USA Univ Pittsburgh Dept Ind Engn Pittsburgh PA 15261 USA Amazon Res & Dev R&D Seattle WA USA Natl Univ Singapore Dept Ind Syst Engn & Management Singapore 117576 Singapore

We consider a rental service with a fixed number of rental units distributed across multiple locations. The units are accessed by customers without prior reservation and on an on-demand basis. Customers can decide on how long to keep a unit and where to return it. Because of the randomness in demand and in returns, there is a need to periodically reposition inventory away from some locations and into others. In deciding on how much inventory to reposition and where, the system manager balances potential lost sales with repositioning costs. Although the problem is increasingly common in applications involving on-demand rental services, not much is known about the nature of the optimal policy for systems with a general network structure or about effective approaches to solving the problem. In this paper, first, we show that the optimal policy in each period can be described in terms of a well-specified region over the state space. Within this region, it is optimal not to reposition any inventory, whereas, outside the region, it is optimal to reposition but only such that the system moves to a new state that is on the boundary of the no-repositioning region. We also provide a simple check for when a state is in the no-repositioning region. Second, we leverage the features of the optimal policy, along with properties of the optimal cost function, to propose a provably convergent approximate dynamic programming algorithm to tackle problems with a large number of dimensions.

关键词： rental networks inventory repositioning optimal policies approximate dynamic programming algorithms stochastic dual dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：