检索结果-内蒙古大学图书馆

A policy optimization algorithm based on sample adaptive reuse and dual-clipping for robotic action control

APPLIED SOFT COMPUTING 2023年 134卷

作者： Zhao, Li -yang Chang, Tian-qing Zhang, Jie Zhang, Lei Chu, Kai-xuan Guo, Li -bin Kong, De-peng Army Acad Armored Forces Dept Weaponry & Control Beijing 100072 Peoples R China Naval Res Inst Beijing 100161 Peoples R China Unit 63963 PLA Beijing 100072 Peoples R China

When applying deep reinforcement learning in the real physical environment for decision-making, how to improve the sample efficiency while ensuring training stability is an urgent problem that needs to be solved. In order to solve this problem, some of on-policy algorithms are proposed and have achieved state-of-the-art performance. However, these on-policy algorithms, such as proximal policy optimization (ppo) algorithm, have the drawback of extremely low sample efficiency. In this study, we proposed a novel policy optimization method named improved proximal policy optimization algorithm based on sample adaptive reuse and dual-clipping (SARD-ppo) for robotic action control, which combines the advantage of the on-policy methods in training stability with the advantage of the off-policy methods in sample efficiency. First, we analyzed the clipping mechanism of the ppo algorithm, devised a more constrained clipping mechanism based on the analysis of the relationship between the clipping mechanism and the objective constraints, and developed a policy updating method that reuses the old samples of the prior policy in a more principle-based way. Second, we ensured the training stability of the algorithm through element-level dual-clipping, as well as adaptive adjustment and reuse of the entire policy trajectory. The experimental results on six tasks in the MuJoCo benchmark indicate that SARD-ppo can significantly improve policy performance while balancing policy training stability and sample efficiency, outperforming the baseline ppo algorithm and other SOTA policy gradient methods using on-and off-policy samples in terms of overall performance.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Training stability Sample reuse Adaptive adjustment Dual-clipping ppo algorithm Sample efficiency

来源：评论

学校读者我要写书评

暂无评论

Optimization of Classroom Interaction Strategies Based on Deep Reinforcement Learning 25

Optimization of Classroom Interaction Strategies Based on De...

引用

Proceedings of the 2025 International Conference on Big Data and Informatization Education

作者： Ying Li Jiaqi Liu Wei Ji Yantao Li School of Marxism Guangdong University of Science and Technology Dongguan Guangdong China School of Creative Design Dongguan City University Dongguan Guangdong China Guangdong University of Science and Technology Dongguan Guangdong China Scientific Research Office Guangdong University of Science and Technology Dongguan Guangdong China

ISBN: (纸本)9798400714405

With the rapid development of information technology, intelligent education has gradually become an important direction of modern education reform. Classroom interaction, as a key factor in improving students' learning and the quality of education, urgently needs new technical means to optimize it. Traditional classroom interaction strategies often have problems such as low student participation and low interaction quality. Therefore, how to optimize classroom interaction through intelligent means has become an important topic in educational research. In this paper, we propose an interaction optimization strategy based on Deep Reinforcement Learning (DRL), which aims to improve the quality of classroom interactions through intelligent algorithms, thus increasing students' engagement and satisfaction in the classroom. In this study, the Proximal Policy Optimization (ppo) algorithm in Deep Reinforcement Learning is used to model variables such as student behaviour, teacher strategy, and classroom state in relation to specific classroom interaction scenarios, and a system framework is constructed that can adaptively adjust the interaction strategy. The experimental data come from 300 students from two universities in Guangdong Province, covering interactive behaviours in several practical training courses.

关键词： ppo algorithm

来源：评论

学校读者我要写书评

暂无评论

Pressure control of Once-through steam generator using Proximal policy optimization algorithm

引用

ANNALS OF NUCLEAR ENERGY 2022年 175卷

作者： Li, Cheng Yu, Ren Yu, Wenmin Wang, Tianshu Naval Univ Engn Wuhan 430033 Peoples R China China Nucl Power Operat Technol Corp LTD Wuhan 430000 Peoples R China

Due to the strong coupling characteristics of the once-through steam generator(OTSG), the outlet pressure control is difficult. The control system using the Proximal Policy Optimization(ppo) algorithm is designed to control the outlet steam pressure of OTSG. The double-layer controller is designed in two layers, the upper layer is the agent using the ppo algorithm to optimize the parameters of the PID in realtime to obtain better control performance. The bottom layer is the PID controller, which receives the commands from the upper layer to directly regulate the feed water valve of OTSG. In the training process of the controller agent, by adopting deep neural network approximation as the approximator of the critic network and actor-new network, good generalization performance is obtained. Compared with the PID controller, the simulation experiment result shows that the method not only has a good tracking ability but also has a good anti-interference ability.(c) 2022 Elsevier Ltd. All rights reserved.

关键词： Once-through steam generator Reinforcement learning ppo algorithm PID

来源：评论

学校读者我要写书评

暂无评论

Robotic Arm Motion Planning Based on Residual Reinforcement Learning 13

Robotic Arm Motion Planning Based on Residual Reinforcement ...

引用

13th International Conference on Computer and Automation Engineering (ICCAE)

作者： Zhou, Dongxu Jia, Ruiqing Yao, Haifeng Xie, Mingzuo China Univ Min & Technol Beijing Sch Mech Elect & Informat Engn Beijing Peoples R China

ISBN: (纸本)9781665412957

The application of reinforcement learning algorithms to motion planning is a research hotspot in robotics in recent years. However, training reinforcement learning agents from scratch has low training efficiency and difficulty in convergence. In this paper, a robot motion planning method based on residual reinforcement learning is proposed. This method divides the agent's policy of motion planning into initial policy and residual policy. The initial policy is composed of a neural network motion planner responsible for guiding the training direction of residual policy. The residual policy is composed of Proximal Policy Optimization (ppo) algorithm in reinforcement learning. A motion planning experiment is carried out in a simulation environment, and the result shows that the method can successfully perform motion planning. The comparison experiment between ppo and the proposed algorithm demonstrates that the proposed algorithm has better motion planning performance.

关键词： robotic arm motion planning ppo algorithm residual policy

来源：评论

学校读者我要写书评

暂无评论

Multi-Agent Path Planning Based on Deep Reinforcement Learning

Multi-Agent Path Planning Based on Deep Reinforcement Learni...

引用

第32届中国过程控制会议(CPCC2021)

作者： Bai Weisong Zhang Chunmei Guo Hongge Shao Yang Taiyuan University of Science and Technology

Aiming at the problem of Multi-Agent Path Planning(MAPP),the current algorithms have the disadvantages of large data dimensions and complex *** this paper,A* algorithm and Proximal Policy Optimization(ppo) algorithm are combined to form a hybrid A Star Proximal Policy Optimization(ASppo) algorithm,and a reward function is designed to make agents have a certain probability to choose A* algorithm when they are close to the target *** algorithm enables the agent to use the original sensor data to move from the starting point to its target location in complex and unknown environment to realize end-to-end path *** problem modeling,algorithm design and experimental simulation,ASppo algorithm is compared with MAPP-RL algorithm and Maximum Reward Frequency Q-learning(MRFQ) *** results show that the algorithm is superior to the other two algorithms in terms of success rate,time-consuming,distance and average speed in different *** addition,the algorithm has good mobility in solving MAPP problem,and can meet the needs of different scenarios.

关键词： Multi-Agent Path Planning Unknown Environment Deep Reinforcement Learning A* algorithm ppo algorithm

来源：评论

学校读者我要写书评

暂无评论

A PID Gain Adjustment Scheme Based on Reinforcement Learning algorithm for a Quadrotor

A PID Gain Adjustment Scheme Based on Reinforcement Learning...

引用

39th Chinese Control Conference (CCC)

作者： Zheng Qingqing Tang Renjie Gou Siyuan Zhang Weizhong Beijing Inst Technol Sch Aerosp Engn Beijing 100081 Peoples R China

ISBN: (数字)9789881563903

ISBN: (纸本)9789881563903

In this paper, a PID gain adjustment scheme with the basis on Reinforcement Learning algorithm is proposed, the validity of the scheme is demonstrated with the application to the control of a quadrotor. Specifically, the ppo algorithm of reinforcement learning is utilized in the scheme to adjust a PID controller gains. The procedure and details of the scheme are presented. The experiments prove that the control strategy with this scheme can quickly make the controlled system converge and stabilize. The scheme, compared with a traditional PID controller, has a good performance in terms of control stability, anti-interference stability, and aircraft altitude stability.

关键词： Quadrotor Control PID controller Reinforcement learning ppo algorithm

来源：评论

学校读者我要写书评

暂无评论

A PID Gain Adjustment Scheme Based on Reinforcement Learning algorithm for a Quadrotor

A PID Gain Adjustment Scheme Based on Reinforcement Learning...

引用

第三十九届中国控制会议

作者： Zheng Qingqing Tang Renjie Gou Siyuan Zhang Weizhong School of Aerospace Engineering Beijing Institute of Technology

In this paper, a PID gain adjustment scheme with the basis on Reinforcement Learning algorithm is proposed, the validity of the scheme is demonstrated with the application to the control of a quadrotor. Specifically, the ppo algorithm of reinforcement learning is utilized in the scheme to adjust a PID controller gains. The procedure and details of the scheme are presented. The experiments prove that the control strategy with this scheme can quickly make the controlled system converge and stabilize. The scheme, compared with a traditional PID controller, has a good performance in terms of control stability, antiinterference stability, and aircraft altitude stability.

关键词： Quadrotor Control PID controller Reinforcement learning ppo algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：