This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control...
详细信息
ISBN:
(纸本)9781479945528
This paper addresses the case of dual learning in the pursuit-evasion (PE) differential game and examines how fast the players can learn their default control strategies. The players should learn their default control strategies simultaneously by interacting with each other. Each player's learning process depends on the rewards received from its environment. The learning process is implemented using a two stage learning algorithm that combines the particleswarmoptimization (PSO)-basedfuzzylogiccontrol (FLC) algorithm with the Q-Learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzylogiccontroller whereas the QFIS algorithm is used as a local optimizer. The two stage learning algorithm is compared through simulation with the default control strategy, the PSO-based FLC algorithm, and the QFIS algorithm. Simulation results show that the players are able to learn their default control strategies. Also, it shows that the two stage learning algorithm outperforms the PSO-based FLC algorithm and the QFIS algorithm with respect to the learning time.
暂无评论