This study investigates whether the presence of both quality-and value-based online reviews help firms make decisions. To adapt to a complex real-world environment, we construct two simulated environments with high an...
详细信息
This study investigates whether the presence of both quality-and value-based online reviews help firms make decisions. To adapt to a complex real-world environment, we construct two simulated environments with high and low initial consumer-perceived quality and employ a proximal policy optimization algorithm (PPO) to derive optimal pricing strategies. The simulation results show that retailers can gain higher revenue by considering quality-based reviews only when the consumers' initial perceived quality is low. In addition, retailers must choose an appropriate promotion method based on the social learning speed of the consumer group. When the social learning speed is slow, retailers should invest more in promotion costs to improve the initial perceived quality of consumers and thus increase revenue. Compared to the Advantage Actor-Critic algorithm, the PPO algorithm exhibits better performance, provides a new approach for complex and continuous revenue man-agement problems, and can be applied to a wider range of areas.
In most industrial sectors, large coal-fired boilers are a source of carbon and pollutant emissions, so it is important to carry out combustion adjustment and optimize energy-saving operation of coal-fired boilers. Tr...
详细信息
In most industrial sectors, large coal-fired boilers are a source of carbon and pollutant emissions, so it is important to carry out combustion adjustment and optimize energy-saving operation of coal-fired boilers. Traditional combustion adjustment relies on human intervention, but manual adjustment is difficult to achieve synergistic optimization of NOx and thermal efficiency at the same time, so there is a large adjustment space for boiler combustion optimization. Artificial intelligence technology can explore the potential of combustion optimization from boiler operation data. Currently, the boiler combustion optimization method based on supervised learning modeling and optimizationalgorithms has good optimization effect and high application value. At present, there are problems such as the combination of dynamic model and optimizationalgorithm is difficult and the optimization time is long, etc. This paper adopts feature classification and multi-model coupling to build a static-dynamic composite prediction model of boiler performance indicators, dynamic prediction model of boiler thermal efficiency and nitrogen oxides (NOx) is established by using long short-term memory (LSTM) and one-dimensional convolutional neural network (1D_CNN). The model is categorized into static and dynamic models based on the input features, and the dynamic model is coupled with BP neural network to establish a static-dynamic composite prediction model and further couples the proximalpolicyoptimization (PPO) reinforcement learning algorithm to establish a boiler in-place optimization strategy. Through the experimental validation of 5619 test cases, the strategy successfully achieves 63.5 % co-optimization of NOx and thermal efficiency, with thermal efficiency increase ranging from 0-0.61 % and NOx reduction ranging from 0-65 mg/m3. Meanwhile, comparing the optimization effect of the PPO algorithm with that of the genetic algorithm (GA) shows that the PPO strategy has a more signifi
Deep Reinforcement Learning (DRL) has shown great potential in addressing complex decision-making challenges, especially within high-dimensional and dynamic environments. However, DRL faces limitations, such as low sa...
详细信息
In the scene of random patching in the industrial scene, an algorithm based on a distributed frame of proximalpolicyoptimization (PPO) with Generalized Advantage Estimation (GAE) is proposed in this paper. The visua...
详细信息
ISBN:
(纸本)9781665478960
In the scene of random patching in the industrial scene, an algorithm based on a distributed frame of proximalpolicyoptimization (PPO) with Generalized Advantage Estimation (GAE) is proposed in this paper. The visual part is taken from camera, which is considered as state input. A distributed approach (actor-critic) is established to improve the efficiency of sampling. The sampling data are stored in the experience pool. Both punishment and reward strategies are considered in the raised method. The improved PPO algorithm can be verified on Pybullet. We found that it greatly improves effect in terms of convergence steps and actual reward performance.
In the scene of random patching in the industrial scene,an algorithm based on a distributed frame of proximalpolicyoptimization(PPO) with Generalized Advantage Estimation(GAE) is proposed in this *** visual part is ...
详细信息
In the scene of random patching in the industrial scene,an algorithm based on a distributed frame of proximalpolicyoptimization(PPO) with Generalized Advantage Estimation(GAE) is proposed in this *** visual part is taken from camera,which is considered as state input.A distributed approach(actor-critic) is established to improve the efficiency of *** sampling data are stored in the experience *** punishment and reward strategies are considered in the raised *** improved PPO algorithm can be verified on *** found that it greatly improves effect in terms of convergence steps and actual reward performance.
For the scenario where the overall layout is known and the obstacle distribution information is unknown, a dynamic path planning algorithm combining the A* algorithm and the proximalpolicyoptimization (PPO) algorith...
详细信息
For the scenario where the overall layout is known and the obstacle distribution information is unknown, a dynamic path planning algorithm combining the A* algorithm and the proximalpolicyoptimization (PPO) algorithm is proposed. Simulation experiments show that in all six test environments, the proposed algorithm finds paths that are on average about 2.04% to 5.86% shorter compared to the state-of-the-art algorithms in the literature, and reduces the number of training epochs before stabilization from tens of thousands to about 4000.
Chaos phenomena can be observed extensively in many real-world scenarios, which usually presents a challenge to suppress those undesired behaviors. Unlike the traditional linear and nonlinear control methods, this stu...
详细信息
Chaos phenomena can be observed extensively in many real-world scenarios, which usually presents a challenge to suppress those undesired behaviors. Unlike the traditional linear and nonlinear control methods, this study introduces a deep reinforcement learning (DRL)-based scheme to regulate chaotic food web system (FWS). Specifically, we utilize the proximalpolicyoptimization (PPO) algorithm to train the agent model, which does not necessitate the prior knowledge of chaotic FWS. Experimental results demonstrate that the developed DRL-based control scheme can effectively guide the FWS toward a predetermined stable state. Furthermore, this investigation considers the influence of environmental noise on the chaotic FWS, and we obtain the important result that incorporating noise during the training process can enhance the controller's robustness and system adaptability.
A novel proximalpolicyoptimization (PPO) algorithm is proposed to solve the motion control problem for an underactuated unmanned surface vehicle (USV). In order to solve the zero-gradient problem of the algorithm du...
详细信息
ISBN:
(纸本)9798350334722
A novel proximalpolicyoptimization (PPO) algorithm is proposed to solve the motion control problem for an underactuated unmanned surface vehicle (USV). In order to solve the zero-gradient problem of the algorithm during training, a Jensen-Shannon (JS) divergence and clipped objective function is introduced to reduced differences between old and new strategy achieve more stable and faster navigation control of unmanned surface vehicle. In addition, a boundary protected hierarchical reward function was designed to enhance the decision network for USV angle and speed control by evaluating output decisions of the PPO. Simulation results show that the proposed method can effectively implement the motion control of unmanned surface vehicle and improve the convergence rate of the algorithm.
Tool wear is critically important for the optimization of cutting parameters. However, the increasing nature of tool wear presents challenges to traditional meta-heuristic cutting parameter optimization methods. To ad...
详细信息
Tool wear is critically important for the optimization of cutting parameters. However, the increasing nature of tool wear presents challenges to traditional meta-heuristic cutting parameter optimization methods. To address this issue, we propose an innovative deep reinforcement learning-driven cutting parameters adaptive optimization method taking tool wear into account. More specifically, we use the Markov Decision Process to simulate the optimization process of cutting parameters. Firstly, an innovative deep transfer learning algorithm is used for monitoring tool wear. With the progress of tool wear, the proximalpolicyoptimization method of the transformer with multi-head attention mechanism interacts with the processing environment through a process of trial and error, and accumulates a wealth of experience in selecting cutting parameters through the reward function. The deep reinforcement learning model has quickly discern the best cutting parameters, relying on real-time tool wear value. The experimental results show that the proposed method outperforms other algorithms.
This paper aims to address the bidding strategy optimization in the real-time multi-participant electricity market with short-term load dynamics. In order to avoid the sub-optimal solution and the dependence on the co...
详细信息
This paper aims to address the bidding strategy optimization in the real-time multi-participant electricity market with short-term load dynamics. In order to avoid the sub-optimal solution and the dependence on the complete information in traditional mathematical programming methods, an electricity market bidding strategy optimizationalgorithm based on deep reinforcement learning (DRL) is developed. While conventional reinforcement learning algorithms (e.g., Q-learning and deep Q-learning) are only capable of handling simple problems in discrete state spaces, the proximalpolicyoptimization (PPO) algorithm is implemented in the bidding strategy optimization since it can optimize the bidding strategy in the continuous action and state spaces. In order to substantiate the aforementioned perspective, this paper conducts a two-part experimental study. First, experiments which consider the fixed demand load of market participants show that the developed method can reach the Nash equilibrium just like the bi-level optimization, and higher profits can be achieved by adjusting hyperparameters. Then, complex experiments which consider the time-varying demand load verify the DRL-based electricity market bidding strategy performs better than bi-level optimization-based methods and increases the profits of generators.
暂无评论