咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >An Empirical Investigation of ... 收藏

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

作     者:Dossa, Rousslan Fernand Julien Huang, Shengyi Ontanon, Santiago Matsubara, Takashi 

作者机构:Kobe Univ Grad Sch Syst Informat Kobe Hyogo 6578501 Japan Drexel Univ Coll Comp & Informat Philadelphia PA 19104 USA Osaka Univ Grad Sch Engn Sci Osaka 5608531 Japan 

出 版 物:《IEEE ACCESS》 (IEEE Access)

年 卷 期:2021年第9卷

页      面:117981页

核心收录:

基  金:Japanese Ministry of Education  Culture  Sports  Science  and Technology (MEXT) 

主  题:Artificial Intelligence deep learning reinforcement learning proximal policy optimization robotics and automation robot learning 

摘      要:Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as early stopping implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (K), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) mitigate such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on K.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分