文献详情 >Approximate Modified Policy It... 收藏

Approximate Modified Policy Iteration and its Application to the Game of Tetris

近似修改政策重复和它的申请到 Tetris 的比赛

作者：Scherrer, Bruno Ghavamzadeh, Mohammad Gabillon, Victor Lesner, Boris Geist, Matthieu

作者机构：INRIA Nancy Grand Est Team Maia 615 Rue Jardin Bot F-54600 Vandoeuvre Ls Nancy France Adobe Res San Jose CA 95110 USA INRIA Lille San Jose CA 95110 USA INRIA Lille Nord Europe Team SequeL F-59650 Villeneuve Dascq France GeorgiaTech CNRS IMS MaLIS Res Grp Cent Supelec F-57070 Metz France GeorgiaTech CNRS UMI 2958 F-57070 Metz France

出版物：《JOURNAL OF MACHINE LEARNING RESEARCH》 (机器学习研究杂志)

年卷期：2015年第16卷第1期

页面：1629-1676页

核心收录：

学科分类：08[工学] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术（可授工学、理学学位）]

主　　题：approximate dynamic programming reinforcement learning Markov decision processes finite-sample analysis performance bounds game of tetris

摘要：Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI s main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Approximate Modified Policy Iteration and its Application to the Game of Tetris

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Approximate Modified Policy Iteration and its Application to the Game of Tetris

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：