文献详情 >Technical update: Least-square... 收藏

Technical update: Least-squares temporal difference learning

技术更改：最少平方的时间的差别学习

作者：Boyan, JA

作者机构：ITA Software Cambridge MA 02139 USA

出版物：《MACHINE LEARNING》 (机器学习)

年卷期：2002年第49卷第2-3期

页面：233-246页

核心收录：

学科分类：08[工学] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：National Aeronautics and Space Administration, NASA Carnegie Mellon University, CMU

主　　题：reinforcement learning temporal difference learning value function approximation linear least-squares methods

摘要：TD(lambda) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(lambda) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and lambda = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1-3, 33-57) eliminates all stepsize parameters and improves data efficiency. This paper updates Bradtke and Barto s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from lambda = 0 to arbitrary values of lambda;at the extreme of lambda = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Technical update: Least-squares temporal difference learning

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Technical update: Least-squares temporal difference learning

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：