This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected perfor...
详细信息
This work proposes a methodology to generate risk averse policies for Markov Decision Processes(MDPs). This methodology is based on modifying the one stage reward or cost to weigh the trade-off between expected performance and downside risk represented by (CVαR α ). The modified stage-wise utility function is used within dynamicprogramming to generate a set of policies representing different levels of the trade-off. The approach is demonstrated in a shortest path optimal control problem and a project management problem modeled as constrained MDP. To address a more complex management problem, we utilize the realtime Approximate dynamicprogramming algorithm.
暂无评论