Path planning is one of the important components of the Unmanned Aerial Vehicle (UAV)mission, and it is also the key guarantee for the successful completion of the UAV39;s mission. The traditionalpath planning algor...
详细信息
Path planning is one of the important components of the Unmanned Aerial Vehicle (UAV)mission, and it is also the key guarantee for the successful completion of the UAV's mission. The traditionalpath planning algorithm has certain limitations and deficiencies in the complex dynamic *** at the dynamic complex obstacle environment, this paper proposes an improved td3algorithm,which enables the UAV to complete the autonomous path planning through online learning and continuoustrial and error. The algorithm changes the experience pool of td3algorithm to priority experience replay,so that the agent can distinguish the importance of empirical samples, improve the sampling efficiency ofthe algorithm, and reduce the training time. The averagetd3 is proposed, and the average value ofQ1Q2is taken when the target value is updated to solve the problem of overestimating theQvalue while avoidingunderestimating theQvalue, so that the improved algorithm has better stability and can adapt to variouscomplex obstacle environments. A new reward function is set up, so that each step of the UAV action canreceive reward feedback, which solves the problem of sparse reward in deep reinforcement learning. Theexperimental results show that this method can train the UAV to reach the target safely and quickly in amulti-obstacle environment. Compared with DDPG, SAC and traditional td3, the path planning successrate of this algorithm is higher than that of the other three algorithms, and the collision rate is lower than thatof the comparison algorithm, which has better path planning performance
暂无评论