咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Technical update: Least-square... 收藏

Technical update: Least-squares temporal difference learning

技术更改:最少平方的时间的差别学习

作     者:Boyan, JA 

作者机构:ITA Software Cambridge MA 02139 USA 

出 版 物:《MACHINE LEARNING》 (机器学习)

年 卷 期:2002年第49卷第2-3期

页      面:233-246页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:National Aeronautics and Space Administration, NASA Carnegie Mellon University, CMU 

主  题:reinforcement learning temporal difference learning value function approximation linear least-squares methods 

摘      要:TD(lambda) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD(lambda) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and lambda = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1-3, 33-57) eliminates all stepsize parameters and improves data efficiency. This paper updates Bradtke and Barto s work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from lambda = 0 to arbitrary values of lambda;at the extreme of lambda = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分