作者:
Horiguchi, MChiba Univ
Grad Sch Sci & Technol Div Math Sci & Phys Inage Ku Chiba 2608522 Japan
In this paper, a optimization problem for stopped Markov decision processes with vector-valued terminal reward and multiple running cost constraints is considered. Applying the idea of occupation measures and using th...
详细信息
In this paper, a optimization problem for stopped Markov decision processes with vector-valued terminal reward and multiple running cost constraints is considered. Applying the idea of occupation measures and using the scalarization technique for vector maximization problems we obtain the equivalent mathematicalprogramming problem and show the existence of a Pareto optimal pair of stationary policy and stopping time requiring randomization in at most k states, where k is the number of constraints. Moreover Lagrange multiplier approaches are considered. The saddle-point statements are given, whose results are applied to obtain a related parametric mathematicalprogramming, by which the problem is solved. Numerical examples are given.
暂无评论