版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Iowa Tippie Coll Business Iowa City IA 52242 USA Univ Illinois Coll Business Adm Chicago IL 60607 USA
出 版 物:《MANAGEMENT SCIENCE》 (管理科学)
年 卷 期:2020年第66卷第4期
页 面:1544-1562页
核心收录:
学科分类:12[管理学] 120202[管理学-企业管理(含:财务管理、市场营销、人力资源管理)] 0202[经济学-应用经济学] 02[经济学] 1202[管理学-工商管理] 1201[管理学-管理科学与工程(可授管理学、工学学位)]
主 题:approximate linear programming approximate dynamic programming stochastic gradient descent Inventory control energy storage
摘 要:Approximate linear programs (ALPs) are well-known models for computing value function approximations (VFAs) of intractable Markov decision processes (MDPs). VFAs from ALPs have desirable theoretical properties, define an operating policy, and provide a lower bound on the optimal policy cost. However, solving ALPs near-optimally remains challenging, for example, when approximating MDPs with nonlinear cost functions and transition dynamics or when rich basis functions are required to obtain a good VFA. We address this tension between theory and solvability by proposing a convex saddle-point reformulation of an ALP that includes as primal and dual variables, respectively, a vector of basis function weights and a constraint violation density function over the state-action space. To solve this reformulation, we develop a proximal stochastic mirror descent (PSMD) method that learns regions of high ALP constraint violation via its dual update. We establish that PSMD returns a near-optimal ALP solution and a lower bound on the optimal policy cost in a finite number of iterations with high probability. We numerically compare PSMD with several benchmarks on inventory control and energy storage applications. We find that the PSMD lower bound is tighter than a perfect information bound. In contrast, the constraint-sampling approach to solve ALPs may not provide a lower bound, and applying row generation to tackle ALPs is not computationally viable. PSMD policies outperform problem-specific heuristics and are comparable or better than the policies obtained using constraint sampling. Overall, our ALP reformulation and solution approach broadens the applicability of approximate linear programming.