When intelligent agents act in a stochastic environment, the principle of maximizing expected rewards is used to optimize their policies. The rationality of the maximum rewards becomes a single objective when agents’...
详细信息
When intelligent agents act in a stochastic environment, the principle of maximizing expected rewards is used to optimize their policies. The rationality of the maximum rewards becomes a single objective when agents’ decision problems are solved in most cases. This sometimes leads to the agents’ behaviors (the optimal policies for solving the decision problems) that are not legible. In other words, it is difficult for users (or other agents and even humans) to understand the agents’ intentions when they are executing the optimal policies. Hence, it becomes pertinent to consider the legibility of agents’ decision problems. The key challenge lies in formulating a proper legibility function in the problems. Using domain experts’ inputs leans to be subjective and inconsistent in specifying legibility values, and the manual approach quickly becomes infeasible in a complex problem domain. In this article, we aim to learn such a legibility function parallel to developing a (conventional) reward function. We adopt inverse reinforcement learning techniques to automate a legibility function in agents’ decision problems. We first demonstrate the effectiveness of the inverse reinforcement learning technique when legibility is solely considered in a decision problem. Things become complicated when both the reward and legibility functions are to be found. We develop a multi-objective inverse reinforcement learning method to automate the two functions in a good balance simultaneously. We vary problem domains in the performance study and provide empirical results in support.
暂无评论