文献详情 >Latent Landmark Graph for Effi... 收藏

Latent Landmark Graph for Efficient Explorationexploitation Balance in Hierarchical Reinforcement Learning

作者：Qingyang Zhang Hongming Zhang Dengpeng Xing Bo Xu Qingyang Zhang;Hongming Zhang;Dengpeng Xing;Bo Xu

作者机构：Institute of AutomationChinese Academy of SciencesBeijing 100190China School of Future TechnologyUniversity of Chinese Academy of SciencesBeijing 100049China Department of Computing ScienceUniversity of AlbertaEdmonton T6G 2E8Canada School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijing 100049China

出版物：《Machine Intelligence Research》 (机器智能研究(英文版))

年卷期：2025年第22卷第2期

页面：267-288页

核心收录：

学科分类：12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术（可授工学、理学学位）]

基　　金：supported by National Key R&D Program of China(No.2022ZD0116405) the Strategic Priority Research Program of the Chinese Academy of Sciences,China(No.XDA27030300)

主　　题：Hierarchical reinforcement learning representation learning latent landmark graph contrastive learning exploration and exploitation.

摘要：Goal-conditioned hierarchical reinforcement learning(GCHRL)decomposes the desired goal into subgoals and conducts exploration and exploitation in the subgoal *** effectiveness heavily relies on subgoal representation and ***,existing works do not consider distinct information across hierarchical time scales when learning subgoal representations and lack a subgoal selection strategy that balances exploration and *** this paper,we propose a novel method for efficient exploration-exploitation balance in HIerarchical reinforcement learning by dynamically constructing Latent Landmark graphs(HILL).HILL transforms the reward maximization problem of GCHRL into the shortest path planning on *** effectively consider the hierarchical time-scale information,HILL adopts a contrastive representation learning objective to learn informative latent *** on these representations,HILL dynamically constructs latent landmark graphs and selects subgoals using two measures to balance exploration and *** implement two variants:HILL-hf generates graphs periodically,while HILL-lf generates graphs *** results on continuous control tasks with sparse rewards demonstrate that both variants outperform state-of-the-art baselines in sample efficiency and asymptotic performance,with HILL-lf further reducing training time by 40%compared to HILL-hf.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Latent Landmark Graph for Efficient Explorationexploitation Balance in Hierarchical Reinforcement Learning

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Latent Landmark Graph for Efficient Explorationexploitation Balance in Hierarchical Reinforcement Learning

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：