版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:McGill Univ Sch Comp Sci Montreal PQ H3A 0E9 Canada Carnegie Mellon Univ Inst Robot Pittsburgh PA 15213 USA Mitsubishi Elect Res Labs Cambridge MA 02139 USA Natl Lab Sci Comp LNCC BR-25651075 Petropolis Brazil
出 版 物:《IEEE TRANSACTIONS ON AUTOMATIC CONTROL》 (IEEE Trans Autom Control)
年 卷 期:2015年第60卷第11期
页 面:2989-2993页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0811[工学-控制科学与工程]
基 金:Natural Sciences and Engineering Research Council of Canada (NSERC)
主 题:Approximate dynamic programming approximate policy iteration classification finite-sample analysis reinforcement learning
摘 要:Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework that can exploit regularities of both. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms, and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results.