版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Anhui Polytech Univ Coll Elect Engn Wuhu Peoples R China
出 版 物:《FRONTIERS IN NEUROROBOTICS》 (Front. Neurorobotics)
年 卷 期:2025年第19卷
页 面:1512953页
核心收录:
基 金:Natural Science Research Project of Anhui Province Universities [2022AH050977]
主 题:deep reinforcement learning mobile robot path planning BiLSTM Dueling Network
摘 要:Aiming at the problems of slow network convergence, poor reward convergence stability, and low path planning efficiency of traditional deep reinforcement learning algorithms, this paper proposes a BiLSTM-D3QN (Bidirectional Long and Short-Term Memory Dueling Double Deep Q-Network) path planning algorithm based on the DDQN (Double Deep Q-Network) decision model. Firstly, a Bidirectional Long Short-Term Memory network (BiLSTM) is introduced to make the network have memory, increase the stability of decision making and make the reward converge more stably;secondly, Dueling Network is introduced to further solve the problem of overestimating the Q-value of the neural network, which makes the network able to be updated quickly;Adaptive reprioritization based on the frequency penalty function is proposed. Experience Playback, which extracts important and fresh data from the experience pool to accelerate the convergence of the neural network;finally, an adaptive action selection mechanism is introduced to further optimize the action exploration. Simulation experiments show that the BiLSTM-D3QN path planning algorithm outperforms the traditional Deep Reinforcement Learning algorithm in terms of network convergence speed, planning efficiency, stability of reward convergence, and success rate in simple environments;in complex environments, the path length of BiLSTM-D3QN is 20 m shorter than that of the improved ERDDQN (Experience Replay Double Deep Q-Network) algorithm, the number of turning points is 7 fewer, the planning time is 0.54 s shorter, and the success rate is 10.4% higher. The superiority of the BiLSTM-D3QN algorithm in terms of network convergence speed and path planning performance is demonstrated.