We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. To efficiently implement an approximaterobust ...
详细信息
ISBN:
(纸本)9781424496365
We study finite-state, finite-action, discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices in deterministic policy spaces. To efficiently implement an approximate robust policy iteration algorithm for computing a robust optimal or near-optimal policy, a reliable and tight set estimate of the parameters of the transition matrix is needed in advance. However, observation samples on state transitions may be small. Prior information on the parameter space may be incomplete or unavailable. In such cases, a commonly used maximum a posterior (MAP) model may not provide a reliable optimal set estimate of the parameters. In this paper, using the advantages of Dempster-Shafer theory of evidence over Bayesian theory, a belief function model is proposed based on minimizing the cardinality of a set estimate. This new model can give a more reliable optimal solution to cover the true parameters than the MAP model. It degenerates to the MAP model when prior information on the parameter space is complete or prior information is unavailable but observation samples on state transitions are large enough. Moreover, we create a concept of principle components to characterize large observation samples so that both models result in the same reliable and tight results. The computation complexity of the new model is also discussed.
暂无评论