检索结果-内蒙古大学图书馆

proximal policy optimization algorithm for dynamic pricing with online reviews

EXPERT SYSTEMS WITH APPLICATIONS 2023年第PartC期213卷

作者： Wu, Chao Bi, Wenjie Liu, Haiying Cent South Univ Business Sch Changsha 410083 Peoples R China Hunan Univ Finance & Econ Accounting Sch Changsha 410083 Peoples R China

This study investigates whether the presence of both quality-and value-based online reviews help firms make decisions. To adapt to a complex real-world environment, we construct two simulated environments with high and low initial consumer-perceived quality and employ a proximal policy optimization algorithm (PPO) to derive optimal pricing strategies. The simulation results show that retailers can gain higher revenue by considering quality-based reviews only when the consumers' initial perceived quality is low. In addition, retailers must choose an appropriate promotion method based on the social learning speed of the consumer group. When the social learning speed is slow, retailers should invest more in promotion costs to improve the initial perceived quality of consumers and thus increase revenue. Compared to the Advantage Actor-Critic algorithm, the PPO algorithm exhibits better performance, provides a new approach for complex and continuous revenue man-agement problems, and can be applied to a wider range of areas.

关键词： Social learning proximal policy optimization algorithm Advantage actor -critic algorithm Dynamic pricing Quality -based review Value -based review

来源：评论

学校读者我要写书评

暂无评论

Combustion optimization study of pulverized coal boiler based on proximal policy optimization algorithm

引用

APPLIED THERMAL ENGINEERING 2024年 254卷

作者： Wu, Xuecheng Zhang, Hongnan Chen, Huafeng Wang, Shifeng Gong, Lingling Zhejiang Univ Ningbo Innovat Ctr Ningbo 315100 Peoples R China Zhejiang Univ State Key Lab Clean Energy Utilizat Hangzhou 310027 Peoples R China Hangzhou Vocat & Tech Coll Hangzhou 310019 Peoples R China Zhejiang Tech Inst Econ Hangzhou 310018 Peoples R China

In most industrial sectors, large coal-fired boilers are a source of carbon and pollutant emissions, so it is important to carry out combustion adjustment and optimize energy-saving operation of coal-fired boilers. Traditional combustion adjustment relies on human intervention, but manual adjustment is difficult to achieve synergistic optimization of NOx and thermal efficiency at the same time, so there is a large adjustment space for boiler combustion optimization. Artificial intelligence technology can explore the potential of combustion optimization from boiler operation data. Currently, the boiler combustion optimization method based on supervised learning modeling and optimization algorithms has good optimization effect and high application value. At present, there are problems such as the combination of dynamic model and optimization algorithm is difficult and the optimization time is long, etc. This paper adopts feature classification and multi-model coupling to build a static-dynamic composite prediction model of boiler performance indicators, dynamic prediction model of boiler thermal efficiency and nitrogen oxides (NOx) is established by using long short-term memory (LSTM) and one-dimensional convolutional neural network (1D_CNN). The model is categorized into static and dynamic models based on the input features, and the dynamic model is coupled with BP neural network to establish a static-dynamic composite prediction model and further couples the proximal policy optimization (PPO) reinforcement learning algorithm to establish a boiler in-place optimization strategy. Through the experimental validation of 5619 test cases, the strategy successfully achieves 63.5 % co-optimization of NOx and thermal efficiency, with thermal efficiency increase ranging from 0-0.61 % and NOx reduction ranging from 0-65 mg/m3. Meanwhile, comparing the optimization effect of the PPO algorithm with that of the genetic algorithm (GA) shows that the PPO strategy has a more signifi

关键词： Pulverized coal boiler Energy saving and emission reduction Reinforcement learning proximal policy optimization algorithm Online combustion optimization

来源：评论

学校读者我要写书评

暂无评论

An attention based proximal policy optimization algorithm for imitation reinforcement learning

An attention based proximal policy optimization algorithm fo...

引用

2024 International Conference on Computer Graphics, Artificial Intelligence, and Data Processing, ICCAID 2024

作者： Zhang, Hui Xia, Yu Chen, Li Hu, Rong Han, Longzhe Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City Nanchang Institute of Technology 289 Tianxiang Avenue Nanchang 330099 China

ISBN: (纸本)9781510689275

Deep Reinforcement Learning (DRL) has shown great potential in addressing complex decision-making challenges, especially within high-dimensional and dynamic environments. However, DRL faces limitations, such as low sample efficiency and difficulties in reward function design, which restrict its practical applicability. To address these challenges, the paper presents an Attention-based proximal policy optimization algorithm for Imitation Reinforcement Learning (APPO-IRL). APPO-IRL leverages Squeeze and Excitation modules to compress channel features, enabling efficient computation of importance weights with minimal computational overhead. The approach significantly enhances Behavioral Cloning’s (BC) ability to capture critical features. Additionally, a deep ensemble method is integrated within BC to reduce uncertainty when generating expert policies. The policy update is then guided by the Kullback-Leibler divergence between expert and RL policies. Experimental results demonstrate that APPO-IRL outperforms traditional DRL algorithms by reducing the need for complex reward function design and significantly improving sample efficiency and overall performance. Compared to BC-based DRL approaches, APPO-IRL also shows marked improvements in stability and convergence. © 2025 SPIE.

关键词： Attention mechanism Behavioral cloning Deep reinforcement learning proximal policy optimization algorithm

来源：评论

学校读者我要写书评

暂无评论

Research on Manipulator Control Based on Improved proximal policy optimization algorithm 34

Research on Manipulator Control Based on Improved Proximal P...

引用

34th Chinese Control and Decision Conference (CCDC)

作者： Yang, Shaoxiong Wu, Di Pan, Yan He, Yan Dalian Univ Technol Sch Control Sci & Control Engn Dalian 116024 Peoples R China Dalian Univ Technol Sch Naval Architecture & Ocean Engn Dalian 116024 Peoples R China Zhengzhou Univ Light Ind Coll Elect & Informat Engn Zhengzhou 450000 Peoples R China

ISBN: (纸本)9781665478960

In the scene of random patching in the industrial scene, an algorithm based on a distributed frame of proximal policy optimization (PPO) with Generalized Advantage Estimation (GAE) is proposed in this paper. The visual part is taken from camera, which is considered as state input. A distributed approach (actor-critic) is established to improve the efficiency of sampling. The sampling data are stored in the experience pool. Both punishment and reward strategies are considered in the raised method. The improved PPO algorithm can be verified on Pybullet. We found that it greatly improves effect in terms of convergence steps and actual reward performance.

关键词： Manipulator control Deep reinforcement learning proximal policy optimization algorithm

来源：评论

学校读者我要写书评

暂无评论

Research on Manipulator Control Based on Improved proximal policy optimization algorithm

Research on Manipulator Control Based on Improved Proximal P...

引用

第34届中国控制与决策会议

作者： Shaoxiong Yang Di Wu Yan Pan Yan He School of Control Science and Control Engineering Dalian University of Technology School of Naval Architecture & Ocean Engineering Dalian University of Technology College of Electrical and Information Engineering Zhengzhou University of Light Industry

In the scene of random patching in the industrial scene,an algorithm based on a distributed frame of proximal policy optimization(PPO) with Generalized Advantage Estimation(GAE) is proposed in this *** visual part is taken from camera,which is considered as state input.A distributed approach(actor-critic) is established to improve the efficiency of *** sampling data are stored in the experience *** punishment and reward strategies are considered in the raised *** improved PPO algorithm can be verified on *** found that it greatly improves effect in terms of convergence steps and actual reward performance.

关键词： Manipulator control Deep reinforcement learning proximal policy optimization algorithm

来源：评论

学校读者我要写书评

暂无评论

proximal policy optimization based dynamic path planning algorithm for mobile robots

引用

ELECTRONICS LETTERS 2022年第1期58卷 13-15页

作者： Jin, Xin Wang, Zhengxiao Zhejiang Univ Dept Mech Engn 38 ZheDa St Hangzhou Peoples R China

For the scenario where the overall layout is known and the obstacle distribution information is unknown, a dynamic path planning algorithm combining the A* algorithm and the proximal policy optimization (PPO) algorithm is proposed. Simulation experiments show that in all six test environments, the proposed algorithm finds paths that are on average about 2.04% to 5.86% shorter compared to the state-of-the-art algorithms in the literature, and reduces the number of training epochs before stabilization from tens of thousands to about 4000.

关键词： dynamic path planning algorithm proximal policy optimization algorithm path planning obstacle distribution information optimisation simulation experiments collision avoidance mobile robots A* algorithm

来源：评论

学校读者我要写书评

暂无评论

proximal policy optimization approach to stabilize the chaotic food web system

引用

CHAOS SOLITONS & FRACTALS 2025年 192卷

作者： Xu, Liang Ma, Ru-Ru Wu, Jie Rao, Pengchun Jiangnan Univ Sch Artificial Intelligence & Comp Sci Wuxi 214122 Peoples R China Suzhou Univ Sci & Technol Sch Math Sci Suzhou 215009 Peoples R China East China Jiaotong Univ Coll Sci Nanchang 330013 Peoples R China

Chaos phenomena can be observed extensively in many real-world scenarios, which usually presents a challenge to suppress those undesired behaviors. Unlike the traditional linear and nonlinear control methods, this study introduces a deep reinforcement learning (DRL)-based scheme to regulate chaotic food web system (FWS). Specifically, we utilize the proximal policy optimization (PPO) algorithm to train the agent model, which does not necessitate the prior knowledge of chaotic FWS. Experimental results demonstrate that the developed DRL-based control scheme can effectively guide the FWS toward a predetermined stable state. Furthermore, this investigation considers the influence of environmental noise on the chaotic FWS, and we obtain the important result that incorporating noise during the training process can enhance the controller's robustness and system adaptability.

关键词： Chaos Stability Deep reinforcement learning Food web system proximal policy optimization algorithm

来源：评论

学校读者我要写书评

暂无评论

A novel proximal policy optimization control strategy for unmanned surface vehicle 35

A novel proximal policy optimization control strategy for un...

引用

35th Chinese Control and Decision Conference (CCDC)

作者： Wu, Shuai Xue, Wentao Ye, Hui Li, Shun Jiangsu Univ Sci & Technol Sch Automat Zhenjiang Jiangsu Peoples R China

ISBN: (纸本)9798350334722

A novel proximal policy optimization (PPO) algorithm is proposed to solve the motion control problem for an underactuated unmanned surface vehicle (USV). In order to solve the zero-gradient problem of the algorithm during training, a Jensen-Shannon (JS) divergence and clipped objective function is introduced to reduced differences between old and new strategy achieve more stable and faster navigation control of unmanned surface vehicle. In addition, a boundary protected hierarchical reward function was designed to enhance the decision network for USV angle and speed control by evaluating output decisions of the PPO. Simulation results show that the proposed method can effectively implement the motion control of unmanned surface vehicle and improve the convergence rate of the algorithm.

关键词： deep reinforcement learning proximal policy optimization algorithm unmanned surface vehicle

来源：评论

学校读者我要写书评

暂无评论

An innovative deep reinforcement learning-driven cutting parameters adaptive optimization method taking tool wear into account

引用

MEASUREMENT 2025年 242卷

作者： Gao, Zhilie Chen, Ni Yang, Yingfei Li, Liang Nanjing Univ Aeronaut & Astronaut Coll Mech & Elect Engn Nanjing 210016 Peoples R China

Tool wear is critically important for the optimization of cutting parameters. However, the increasing nature of tool wear presents challenges to traditional meta-heuristic cutting parameter optimization methods. To address this issue, we propose an innovative deep reinforcement learning-driven cutting parameters adaptive optimization method taking tool wear into account. More specifically, we use the Markov Decision Process to simulate the optimization process of cutting parameters. Firstly, an innovative deep transfer learning algorithm is used for monitoring tool wear. With the progress of tool wear, the proximal policy optimization method of the transformer with multi-head attention mechanism interacts with the processing environment through a process of trial and error, and accumulates a wealth of experience in selecting cutting parameters through the reward function. The deep reinforcement learning model has quickly discern the best cutting parameters, relying on real-time tool wear value. The experimental results show that the proposed method outperforms other algorithms.

关键词： The cutting parameters adaptive optimization Tool wear monitoring Markov decision process Deep reinforcement learning proximal policy optimization algorithm

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning-based optimal bidding strategy for real-time multi-participant electricity market with short-term load

引用

ELECTRIC POWER SYSTEMS RESEARCH 2024年 233卷

作者： Liu, Chuwei Rao, Xuan Zhao, Bo Liu, Derong Wei, Qinglai Wang, Yonghua Guangdong Univ Technol Sch Automat Guangzhou 510006 Peoples R China Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Southern Univ Sci & Technol Sch Syst Design & Intelligent Mfg Shenzhen 518055 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing 100049 Peoples R China

This paper aims to address the bidding strategy optimization in the real-time multi-participant electricity market with short-term load dynamics. In order to avoid the sub-optimal solution and the dependence on the complete information in traditional mathematical programming methods, an electricity market bidding strategy optimization algorithm based on deep reinforcement learning (DRL) is developed. While conventional reinforcement learning algorithms (e.g., Q-learning and deep Q-learning) are only capable of handling simple problems in discrete state spaces, the proximal policy optimization (PPO) algorithm is implemented in the bidding strategy optimization since it can optimize the bidding strategy in the continuous action and state spaces. In order to substantiate the aforementioned perspective, this paper conducts a two-part experimental study. First, experiments which consider the fixed demand load of market participants show that the developed method can reach the Nash equilibrium just like the bi-level optimization, and higher profits can be achieved by adjusting hyperparameters. Then, complex experiments which consider the time-varying demand load verify the DRL-based electricity market bidding strategy performs better than bi-level optimization-based methods and increases the profits of generators.

关键词： Deep reinforcement learning proximal policy optimization algorithm Bi-level optimization Electricity market bidding strategy Nash equilibrium

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：