版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:State Key Laboratory for Novel Software Technology Nanjing University Nanjing China
出 版 物:《Chinese Journal of Electronics》
年 卷 期:2025年第23卷第1期
页 面:49-54页
学科分类:0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学]
基 金:National Science Foundation of China Program for New Century Excellent Talents in University
主 题:Temporal difference learning Euclidean distance Aerospace electronics Extraterrestrial measurements Approximation algorithms Probability distribution Complexity theory Function approximation
摘 要:Temporal difference (TD) learning family tries to learn a least-squares solution of an approximate Linear value function (LVF) to deal with large scale and/or continuous reinforcement learning problems. However, due to the represented ability of the features in LVF, the predictive error of the learned LVF is bounded by the residual between the optimal value function and the projected optimal value function. In this paper, Temporal difference learning with Piecewise linear basis (PLB-TD) is proposed to further decrease the error bounds. In PLB-TD, there are two steps: (1) build the piecewise linear basis for problems with different dimensions; (2) learn the parameters via some famous members from the TD learning family (linear TD, GTD, GTD2 or TDC), which complexity is $O(n)$ . The error bounds are proved to decrease to zero when the size of the piecewise basis goes into infinite. The empirical results demonstrate the effectiveness of the proposed algorithm.