咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Temporal Difference Learning w... 收藏

Temporal Difference Learning with Piecewise Linear Basis

作     者:Xingguo Chen Yang Gao Shunguo Fan 

作者机构:State Key Laboratory for Novel Software Technology Nanjing University Nanjing China 

出 版 物:《Chinese Journal of Electronics》 

年 卷 期:2025年第23卷第1期

页      面:49-54页

学科分类:0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 

基  金:National Science Foundation of China Program for New Century Excellent Talents in University 

主  题:Temporal difference learning Euclidean distance Aerospace electronics Extraterrestrial measurements Approximation algorithms Probability distribution Complexity theory Function approximation 

摘      要:Temporal difference (TD) learning family tries to learn a least-squares solution of an approximate Linear value function (LVF) to deal with large scale and/or continuous reinforcement learning problems. However, due to the represented ability of the features in LVF, the predictive error of the learned LVF is bounded by the residual between the optimal value function and the projected optimal value function. In this paper, Temporal difference learning with Piecewise linear basis (PLB-TD) is proposed to further decrease the error bounds. In PLB-TD, there are two steps: (1) build the piecewise linear basis for problems with different dimensions; (2) learn the parameters via some famous members from the TD learning family (linear TD, GTD, GTD2 or TDC), which complexity is $O(n)$ . The error bounds are proved to decrease to zero when the size of the piecewise basis goes into infinite. The empirical results demonstrate the effectiveness of the proposed algorithm.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分