咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >An Advantage-based Optimizatio... 收藏
arXiv

An Advantage-based Optimization Method for Reinforcement Learning in Large Action Space

作     者:Lin, Hai Huang, Cheng Chen, Zhihong 

作者机构:Key Laboratory of Aerospace Information Security and Trusted Computing Ministry of Education Hubei Wuhan430072 China School of Cyber Science and Engineering Wuhan University Hubei Wuhan430072 China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Optimization algorithms 

摘      要:Reinforcement learning tasks in real-world scenarios often involve large, high-dimensional action spaces, leading to challenges such as convergence difficulties, instability, and high computational complexity. It is widely acknowledged that traditional value-based reinforcement learning algorithms struggle to address these issues effectively. A prevalent approach involves generating independent sub-actions within each dimension of the action space. However, this method introduces bias, hindering the learning of optimal policies. In this paper, we propose an advantage-based optimization method and an algorithm named Advantage Branching Dueling Q-network (ABQ). ABQ incorporates a baseline mechanism to tune the action value of each dimension, leveraging the advantage relationship across different sub-actions. With this approach, the learned policy can be optimized for each dimension. Empirical results demonstrate that ABQ outperforms BDQ, achieving 3%, 171%, and 84% more cumulative rewards in HalfCheetah, Ant, and Humanoid environments, respectively. Furthermore, ABQ exhibits competitive performance when compared against two continuous action benchmark algorithms, DDPG and TD3. Copyright © 2024, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分