When applying deep reinforcement learning in the real physical environment for decision-making, how to improve the sample efficiency while ensuring training stability is an urgent problem that needs to be solved. In o...
详细信息
When applying deep reinforcement learning in the real physical environment for decision-making, how to improve the sample efficiency while ensuring training stability is an urgent problem that needs to be solved. In order to solve this problem, some of on-policy algorithms are proposed and have achieved state-of-the-art performance. However, these on-policy algorithms, such as proximal policy optimization (ppo) algorithm, have the drawback of extremely low sample efficiency. In this study, we proposed a novel policy optimization method named improved proximal policy optimization algorithm based on sample adaptive reuse and dual-clipping (SARD-ppo) for robotic action control, which combines the advantage of the on-policy methods in training stability with the advantage of the off-policy methods in sample efficiency. First, we analyzed the clipping mechanism of the ppo algorithm, devised a more constrained clipping mechanism based on the analysis of the relationship between the clipping mechanism and the objective constraints, and developed a policy updating method that reuses the old samples of the prior policy in a more principle-based way. Second, we ensured the training stability of the algorithm through element-level dual-clipping, as well as adaptive adjustment and reuse of the entire policy trajectory. The experimental results on six tasks in the MuJoCo benchmark indicate that SARD-ppo can significantly improve policy performance while balancing policy training stability and sample efficiency, outperforming the baseline ppo algorithm and other SOTA policy gradient methods using on-and off-policy samples in terms of overall performance.(c) 2022 Elsevier B.V. All rights reserved.
With the rapid development of information technology, intelligent education has gradually become an important direction of modern education reform. Classroom interaction, as a key factor in improving students' lea...
详细信息
ISBN:
(纸本)9798400714405
With the rapid development of information technology, intelligent education has gradually become an important direction of modern education reform. Classroom interaction, as a key factor in improving students' learning and the quality of education, urgently needs new technical means to optimize it. Traditional classroom interaction strategies often have problems such as low student participation and low interaction quality. Therefore, how to optimize classroom interaction through intelligent means has become an important topic in educational research. In this paper, we propose an interaction optimization strategy based on Deep Reinforcement Learning (DRL), which aims to improve the quality of classroom interactions through intelligent algorithms, thus increasing students' engagement and satisfaction in the classroom. In this study, the Proximal Policy Optimization (ppo) algorithm in Deep Reinforcement Learning is used to model variables such as student behaviour, teacher strategy, and classroom state in relation to specific classroom interaction scenarios, and a system framework is constructed that can adaptively adjust the interaction strategy. The experimental data come from 300 students from two universities in Guangdong Province, covering interactive behaviours in several practical training courses.
Due to the strong coupling characteristics of the once-through steam generator(OTSG), the outlet pressure control is difficult. The control system using the Proximal Policy Optimization(ppo) algorithm is designed to c...
详细信息
Due to the strong coupling characteristics of the once-through steam generator(OTSG), the outlet pressure control is difficult. The control system using the Proximal Policy Optimization(ppo) algorithm is designed to control the outlet steam pressure of OTSG. The double-layer controller is designed in two layers, the upper layer is the agent using the ppo algorithm to optimize the parameters of the PID in realtime to obtain better control performance. The bottom layer is the PID controller, which receives the commands from the upper layer to directly regulate the feed water valve of OTSG. In the training process of the controller agent, by adopting deep neural network approximation as the approximator of the critic network and actor-new network, good generalization performance is obtained. Compared with the PID controller, the simulation experiment result shows that the method not only has a good tracking ability but also has a good anti-interference ability.(c) 2022 Elsevier Ltd. All rights reserved.
The application of reinforcement learning algorithms to motion planning is a research hotspot in robotics in recent years. However, training reinforcement learning agents from scratch has low training efficiency and d...
详细信息
ISBN:
(纸本)9781665412957
The application of reinforcement learning algorithms to motion planning is a research hotspot in robotics in recent years. However, training reinforcement learning agents from scratch has low training efficiency and difficulty in convergence. In this paper, a robot motion planning method based on residual reinforcement learning is proposed. This method divides the agent's policy of motion planning into initial policy and residual policy. The initial policy is composed of a neural network motion planner responsible for guiding the training direction of residual policy. The residual policy is composed of Proximal Policy Optimization (ppo) algorithm in reinforcement learning. A motion planning experiment is carried out in a simulation environment, and the result shows that the method can successfully perform motion planning. The comparison experiment between ppo and the proposed algorithm demonstrates that the proposed algorithm has better motion planning performance.
Aiming at the problem of Multi-Agent Path Planning(MAPP),the current algorithms have the disadvantages of large data dimensions and complex *** this paper,A* algorithm and Proximal Policy Optimization(ppo) algorithm a...
详细信息
Aiming at the problem of Multi-Agent Path Planning(MAPP),the current algorithms have the disadvantages of large data dimensions and complex *** this paper,A* algorithm and Proximal Policy Optimization(ppo) algorithm are combined to form a hybrid A Star Proximal Policy Optimization(ASppo) algorithm,and a reward function is designed to make agents have a certain probability to choose A* algorithm when they are close to the target *** algorithm enables the agent to use the original sensor data to move from the starting point to its target location in complex and unknown environment to realize end-to-end path *** problem modeling,algorithm design and experimental simulation,ASppo algorithm is compared with MAPP-RL algorithm and Maximum Reward Frequency Q-learning(MRFQ) *** results show that the algorithm is superior to the other two algorithms in terms of success rate,time-consuming,distance and average speed in different *** addition,the algorithm has good mobility in solving MAPP problem,and can meet the needs of different scenarios.
In this paper, a PID gain adjustment scheme with the basis on Reinforcement Learning algorithm is proposed, the validity of the scheme is demonstrated with the application to the control of a quadrotor. Specifically, ...
详细信息
ISBN:
(数字)9789881563903
ISBN:
(纸本)9789881563903
In this paper, a PID gain adjustment scheme with the basis on Reinforcement Learning algorithm is proposed, the validity of the scheme is demonstrated with the application to the control of a quadrotor. Specifically, the ppo algorithm of reinforcement learning is utilized in the scheme to adjust a PID controller gains. The procedure and details of the scheme are presented. The experiments prove that the control strategy with this scheme can quickly make the controlled system converge and stabilize. The scheme, compared with a traditional PID controller, has a good performance in terms of control stability, anti-interference stability, and aircraft altitude stability.
In this paper, a PID gain adjustment scheme with the basis on Reinforcement Learning algorithm is proposed, the validity of the scheme is demonstrated with the application to the control of a quadrotor. Specifically, ...
详细信息
In this paper, a PID gain adjustment scheme with the basis on Reinforcement Learning algorithm is proposed, the validity of the scheme is demonstrated with the application to the control of a quadrotor. Specifically, the ppo algorithm of reinforcement learning is utilized in the scheme to adjust a PID controller gains. The procedure and details of the scheme are presented. The experiments prove that the control strategy with this scheme can quickly make the controlled system converge and stabilize. The scheme, compared with a traditional PID controller, has a good performance in terms of control stability, antiinterference stability, and aircraft altitude stability.
暂无评论