文献详情 >Adaptive temporal-difference l... 收藏

Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis

作者：Wang, Guoyong Fu, Tiange Zheng, Ruijuan Zhao, Xuhui Zhu, Junlong Zhang, Mingchuan

作者机构：Luoyang Inst Sci & Technol Sch Informat Engn Luoyang 471023 Peoples R China Longmen Lab Luoyang 471023 Peoples R China Henan Univ Sci & Technol Sch Informat Engn Luoyang 471023 Peoples R China

出版物：《COMPLEX & INTELLIGENT SYSTEMS》 (Complex Intell. Syst.)

年卷期：2025年第11卷第2期

页面：1-19页

核心收录：

基　　金：National Natural Science Foundation of China [62176113, 62172142] Science and Technology Development Plan of Henan Province of China [231100220600, 231111222600] Scientific and Tech- nological Innovation Teams of Colleges and Universities in Henan Province of China [24IRTSTHN022]

主　　题：Adaptive methods Non-asymptotic convergence Nonlinear function approximation Reinforcement learning Temporal-difference learning

摘要：Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of O(1/K)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {O}}}(1/\sqrt{K})$$\end{document}, where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：