In this paper, a distributed adaptive dynamic programming(ADP) framework based on value iteration is proposed for multi-player differential games. In the game setting,players have no access to the information of other...
详细信息
In this paper, a distributed adaptive dynamic programming(ADP) framework based on value iteration is proposed for multi-player differential games. In the game setting,players have no access to the information of others' system parameters or control laws. Each player adopts an on-policy value iteration algorithm as the basic learning framework. To deal with the incomplete information structure, players collect a period of system trajectory data to compensate for the lack of information. The policy updating step is implemented by a nonlinear optimization problem aiming to search for the proximal admissible policy. Theoretical analysis shows that by adopting proximal policy searching rules, the approximated policies can converge to a neighborhood of equilibrium policies. The efficacy of our method is illustrated by three examples, which also demonstrate that the proposed method can accelerate the learning process compared with the centralized learning framework.
暂无评论