The three-dimensional (3D) alignment design of the underground logistics system (ULS) is a key factor in determining the rationality of its underground infrastructure layout. However, existing research mostly focuses ...
详细信息
The three-dimensional (3D) alignment design of the underground logistics system (ULS) is a key factor in determining the rationality of its underground infrastructure layout. However, existing research mostly focuses on the two-dimensional horizontal alignment optimization, while neglecting the alignment design of vertical space scales, which makes it difficult for research results to effectively support projects implementation. Furthermore, traditional operations research methods struggle to handle the massive underground space data processing tasks required for detailed design of ULS alignment. Therefore, this study aims to consider the dual attributes of underground infrastructure and logistics infrastructure of ULS, proposing innovative deep reinforcement learning (DRL) method to achieve 3D alignment planning. Firstly, a DRL model was developed considering the key design factors of underground infrastructure alignment such as construction, cost, space suitability, and underground space resources. Secondly, given the large and complex optimization search space of the problem, a curriculum learning-based proximalpolicyoptimization (CL-PPO) algorithm was proposed to efficiently solve the model. Finally, based on the Suzhou ULS case, simulations of alignment optimization results under different planning orientations were conducted to demonstrate the effectiveness of the model and algorithm. Results show that CL-PPO has significant advantages over PPO in terms of computational efficiency and global optimization capabilities. Additionally, planning orientations have a significant impact on the ULS alignment layout and project construction cost. The innovative optimization method not only enriches the infrastructure planning theory of ULS, but also provides space layout guidance for the utilization of urban underground space.
Door opening, as one of the common actions in daily life, has become an important direction for robotic arm applications. Different door handles open in different ways, to enable the robotic arm to complete the corres...
详细信息
ISBN:
(数字)9789887581581
ISBN:
(纸本)9798350366907
Door opening, as one of the common actions in daily life, has become an important direction for robotic arm applications. Different door handles open in different ways, to enable the robotic arm to complete the corresponding door-opening operation according to the handle category, the proximal policy optimization algorithm is used to open the door. Opening the door contains a multi-segment process such as approaching the handle, operating the handle and pushing the door open. The sparse reward that focuses only on the result of opening the door will lead to the extension of the training time of the robotic arm,or even fail to converge. To address this problem, this paper proposes a segmented adaptive reward. First, consider the segment task of opening the door, design the segmented reward, formulate segmented training rules, and gradually guide the robotic arm to improve the overall training effect. At the same time, the reward adds an adaptive weight adjustment mechanism, which adaptively adjusts the weights according to the current stage of attention to different tasks, and then matches the segmented training to accelerate the training speed. In a simulation environment, the experimental results show that the door opening success rate of our algorithm is 61.04% higher than that of the original PPO algorithm, and it can achieve the round handle opening task that cannot be solved by the original algorithm.
作者:
Wu, YuejiaZhou, JiantaoInner Mongolia Univ
Coll Comp SciMinist EducNatl & Local Joint Engn Inner Mongolia Engn Lab Cloud Comp & Serv Softwar Inner Mongolia Key Lab Social Comp & Data ProcEn Hohhot Peoples R China
Knowledge Graphs (KGs) are often incomplete and sparse. Knowledge graph reasoning aims at completing the KG by predicting missing paths between entities. The reinforcement learning (RL) based method is one of the stat...
详细信息
ISBN:
(纸本)9783030821364;9783030821357
Knowledge Graphs (KGs) are often incomplete and sparse. Knowledge graph reasoning aims at completing the KG by predicting missing paths between entities. The reinforcement learning (RL) based method is one of the state-of-the-art approaches to this work. However, existing RL-based methods have some problems, such as unstable training and poor reward function. Although the DIVINE framework, which a novel plug-and-play framework based on generative adversarial imitation learning, improved existing RL-based algorithms without extra reward engineering, the rate of policy update is slow. This paper proposes the EN-DIVINE framework, using proximal policy optimization algorithms to perform gradient descent when discriminator parameters take policy steps to improve the framework's training speed. Experimental results show that our work can provide an accessible improvement for the DIVINE framework.
In recent years,imperfect information game has become an important touchstone to test the level of artificial *** are many imperfect information game scenarios in the real-world,such as economic transactions,military ...
详细信息
In recent years,imperfect information game has become an important touchstone to test the level of artificial *** are many imperfect information game scenarios in the real-world,such as economic transactions,military games,automatic ***,the study of imperfect information game problems has very important practical *** is a type of imperfect information card game with four players which are divided into two *** mass hidden information in the Guandan game leads to a high-dimensional game *** learning algorithm has efficient ability in strategy search of computer *** it cannot converge under the condition of imperfect information and high-dimensional state space which caused by Guandan *** to these problems,this paper introduces the proximalpolicyoptimization(PPO) algorithm based on deep reinforcement learning to solve the problem of imperfect information,high-dimensional state space,and action *** enables the agent to perceive high-dimensional information and makes decisions according to the acquisition *** experiment result shows that the decision model based on the proximal policy optimization algorithm is better than the intelligence level of the policy Gradient algorithm and A2 C algorithm,which proves that the system has a self-learning,ability to improve the game level of Guandan.
暂无评论