版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
学位级别:博士
导师姓名:Michael Jong Kim
授予年度:2019年
主 题:Bayesian dynamic programming Bayesian learning inventory management
摘 要:Classic inventory control problems typically assume that the demand distribution is known a priori. In reality, this assumption is not always satisfied. Motivated by this concern, the joint optimization of learn- ing and control is studied. We first consider the situation where parameters of the demand distribution are not known a priori, but need to be learned using right-censored sales data. A Bayesian framework is adopted for demand learning and the corresponding control problem is analyzed via Bayesian dynamic programming (BDP). Structural results of the optimal policy are established. In particular, we show that the BDP-optimal decisions can be expressed as the sum of a myopic-optimal decision plus a non- negative exploration boost which is proportional to the posterior index of dispersion of the unknown mean demand. This structure clearly articulates the manner in which the statistical learning and inven- tory control are jointly optimized. Next, we study an optimal inventory control problem in the presence of model miss-specification. In this problem, decision makers account for the miss-specification via solv- ing a worst-case problem against an adversary, nature, who has the ability to alter the underlying demand distribution so as to minimize the decision maker s expected reward. We show that the decision maker s robust-optimal decisions are bounded above by the optimal solutions of the nominal model. This structural result clearly explains the trade-off between optimization and risk aversion. In the last chapter, we attempt to incorporate the elements of the Bayesian and robust approaches, namely robust Bayesian optimization. In particular, we are interested in how decision makers can remain robust to model uncertainty while also learning at the same time. We establish an analytical upper bound of the decision maker s optimal decisions, which can be expressed as the sum of a myopic-optimal decision plus an exploration boost and minus a risk aversion adj