In order to understand the indoor energy-saving strategy optimization method of ddpg algorithm in air conditioning and heating system. Systems, this study proposes an indoor energy-saving strategy optimization method ...
详细信息
In order to understand the indoor energy-saving strategy optimization method of ddpg algorithm in air conditioning and heating system. Systems, this study proposes an indoor energy-saving strategy optimization method based on deep reinforcement learning and ddpg algorithm. In response to the lack of existing intelligent methods in the field of indoor building energy conservation in China, this article first analyzes the factors affecting the energy consumption of refrigeration units, determines the direction of energy conservation, and sets energy-saving control parameters: chilled water outlet temperature, chilled water pump flow rate, cooling water inlet temperature, and cooling water pump flow rate. Secondly, based on the actual situation, the constraint conditions for each control parameter are formulated to determine the optimization objective of minimizing the energy consumption of the refrigeration unit. Then, based on the characteristic that energy-saving parameters are all continuous values, the Enhanced Deep Deterministic Policy Gradient (E-ddpg) algorithm is selected to solve the optimal parameters of control parameters for each load interval. The experimental results show that the algorithm converges from 600 scenarios, indicating that the actions taken by the algorithm from this point onwards can minimize the total energy consumption of the refrigeration unit. Specifically, the range of control parameters obtained is: chilled water outlet temperature To = [7.6, 8.7] degrees C, chilled water pump flow Ti = [25.4, 26.5] m3/h, cooling water inlet temperature Vo = [74.5, 88.4] degrees C, cooling water pump flow Vi = [90.1, 106.3] m3/h. Combining the deep reinforcement learning load prediction method to obtain the next load, the system control parameters are adjusted to the optimal situation in advance. The deep reinforcement learning air conditioning load prediction method has high accuracy in air conditioning load prediction, thereby achieving energy conse
An autonomous optimal trajectory planning method based on the deep deterministic policy gradient (ddpg) algorithm of reinforcement learning (RL) for hypersonic vehicles (HV) is proposed in this paper. First, the traje...
详细信息
An autonomous optimal trajectory planning method based on the deep deterministic policy gradient (ddpg) algorithm of reinforcement learning (RL) for hypersonic vehicles (HV) is proposed in this paper. First, the trajectory planning problem is converted into a Markov Decision Process (MDP), and the amplitude of the bank angle is designated as the control input. The reward function of the MDP is set to minimize the trajectory terminal position errors with satisfying hard constraints. The deep neural network (DNN) is used to approximate the policy function and action-value function in the ddpg framework. The Actor network then computes the control input directly according to flight states. Using a limited exploration strategy, the optimal policy network would be considered fully trained when the reward value reached maximum convergence. Simulation results show that the policy network trained using a ddpg algorithm accomplishes 3-dimensional (3D) trajectory planning during the HV glide phase with high terminal precision and stable convergence. Additionally, the single step calculation time of the policy network occurs in near real time, which suggests great potential as an autonomous online trajectory planner. Monte Carlo experiments prove the strong robustness of the implementation of an autonomous trajectory planner under aerodynamic disturbances.
The purpose of portfolio management is to select a variety of fmancial products to form a portfolio and then manage these portfolios to achieve the purpose of diversifying risk and improving efficiency. In this paper,...
详细信息
ISBN:
(纸本)9781728197241
The purpose of portfolio management is to select a variety of fmancial products to form a portfolio and then manage these portfolios to achieve the purpose of diversifying risk and improving efficiency. In this paper, the Deep Detenninisfic Policy Gradient (ddpg) algorithm with neural networks is used, new states, actions and reward functions are proposed. The empirical analysis shows that this paper's method performs better than the method of investing with Q-learning algorithm, equally-weighted method, investing all funds in risk-free assets, or investing all funds in stocks.
The purpose of portfolio management is to select a variety of financial products to form a portfolio and then manage these portfolios to achieve the purpose of diversifying risk and improving *** this paper,the Deep D...
详细信息
The purpose of portfolio management is to select a variety of financial products to form a portfolio and then manage these portfolios to achieve the purpose of diversifying risk and improving *** this paper,the Deep Deterministic Policy Gradient(ddpg) algorithm with neural networks is used,new states,actions and reward functions are *** empirical analysis shows that this paper's method performs better than the method of investing with Q-learning algorithm,equally-weighted method,investing all funds in risk-free assets,or investing all funds in stocks.
The traditional load frequency control systems suffer from long response time lag of thermal power units, low climbing rate, and poor disturbance resistance ability. By introducing energy storage participation in seco...
详细信息
The traditional load frequency control systems suffer from long response time lag of thermal power units, low climbing rate, and poor disturbance resistance ability. By introducing energy storage participation in secondary frequency regulation and a deep reinforcement learning technique, a new load frequency control strategy is proposed. Firstly, the rules for two operating modes of the energy storage, i.e., adaptive frequency regulation and energy storage self-recovery, are designed. Then, a deep reinforcement learning load frequency controller is designed to dynamically adjust the outputs of the energy storage system and the conventional unit. To improve the exploration efficiency of the deep reinforcement learning algorithm, a random network distillation technique is used. A multi-objective reward function containing an external reward and an additional internal reward is designed. Finally, simulation results show that, compared with the traditional load frequency control strategy, the proposed control strategy can achieve optimal performance in frequency regulation.
The fixed service charge pricing model adopted by traditional electric vehicle aggregators (EVAs) is difficult to effectively guide the demand side resources to respond to the power market price signal. At the same ti...
详细信息
The fixed service charge pricing model adopted by traditional electric vehicle aggregators (EVAs) is difficult to effectively guide the demand side resources to respond to the power market price signal. At the same time, real-time pricing strategy can flexibly reflect the situation of market supply and demand, shift the charging load of electric vehicles (EVs), reduce the negative impact of disorderly charging on the stable operation of power systems, and fully tap the economic potential of EVA participating in the power market. Based on the historical behavior data of EVs, this paper considers various market factors such as peak-valley time-of-use tariff, demand-side response mode and deviation balance of spot market to formulate the objective function of EVA comprehensive revenue maximization and establishes a quarter-hourly vehicle-to-grid (V2G) dynamic time-sharing pricing model based on deep deterministic policy gradient (ddpg) reinforcement learning algorithm. The EVA yield difference between peak-valley time-of-use tariff and hourly pricing strategy under the same algorithm is compared through the case studies. The results show that the scheme with higher pricing frequency can guide the charging behavior of users more effectively, tap the economic potential of power market to a greater extent, and calm the load fluctuation of power grid.
As the Industrial Internet of Things (IIoT) evolves, the rapid growth of connected devices in industrial networks generates massive amounts of data. These transmissions impose stringent requirements on network communi...
详细信息
As the Industrial Internet of Things (IIoT) evolves, the rapid growth of connected devices in industrial networks generates massive amounts of data. These transmissions impose stringent requirements on network communications, including reliable bounded latency and high throughput. To address these challenges, the integration of the fifth-generation (5G) mobile cellular networks and Time-Sensitive Networking (TSN) has emerged as a prominent solution for scheduling diverse traffic flows. While Deep Reinforcement Learning (DRL) algorithms have been widely employed to tackle scheduling issues within the 5G-TSN architecture, existing approaches often neglect throughput optimization in multi-user scenarios and the impact of Channel Quality Indicators (CQI) on resource allocation. To overcome these limitations, this study introduces ME-ddpg, a novel joint resource scheduling algorithm. ME-ddpg extends the Deep Deterministic Policy Gradient (ddpg) model by embedding a Modulation and Coding Scheme (MCS)-based priority scheme. This improvement in computational efficiency is critical for real-time scheduling in IIoT environments. Specifically, ME-ddpg provides latency guarantees for time-triggered applications, ensures throughput for video applications, and maximizes overall system throughput across 5 G and TSN domains. Simulation results demonstrate that the proposed ME-ddpg achieves 100 % latency reliability for time-triggered flows and improves system throughput by 10.84 % over existing algorithms under varying Gate Control List (GCL) configurations and user ratios. Furthermore, due to the combination of MCSbased resource allocation scheme with ddpg model, the proposed ME-ddpg achieves faster convergence speed of the reward function compared to the original ddpg method.
This paper presents the implementation of a Deep Deterministic Policy Gradient (ddpg) algorithm in Reinforcement Learning (RL) for self-balancing a motorcycle. The ddpg agent iteratively interacts with the motorcycle ...
详细信息
This paper presents the implementation of a Deep Deterministic Policy Gradient (ddpg) algorithm in Reinforcement Learning (RL) for self-balancing a motorcycle. The ddpg agent iteratively interacts with the motorcycle environment to develop an optimal control policy, utilizing states such as position and velocity, and actions like motor torque. The study evaluates the performance through simulations and real-time experimentation, demonstrating the algorithm's effectiveness in balancing the motorcycle across various leaning angles and in handling external disturbances and model uncertainties. Comparative analysis with a traditional PD controller highlights ddpg's faster response times, improved disturbance rejection, and enhanced adaptability to uncertainties. The results underscore the potential of RL algorithms in enhancing motorcycle control systems for safer and more efficient operation.
In this paper, we investigate joint vehicle association and multi-dimensional resource management in a vehicular network assisted by multi-access edge computing (MEC) and unmanned aerial vehicle (UAV). To efficiently ...
详细信息
ISBN:
(纸本)9781728194844
In this paper, we investigate joint vehicle association and multi-dimensional resource management in a vehicular network assisted by multi-access edge computing (MEC) and unmanned aerial vehicle (UAV). To efficiently manage the available spectrum, computing, and caching resources for the MEC-mounted base station and UAVs, a resource optimization problem is formulated and carried out at a central controller. Considering the overlong solving time of the formulated problem and the sensitive delay requirements of vehicular applications, we transform the optimization problem using reinforcement learning and then design a deep deterministic policy gradient (ddpg)-based solution. Through training the ddpg-based resource management model offline, optimal vehicle association and resource allocation decisions can be obtained rapidly. Simulation results demonstrate that the ddpg-based resource management scheme can converge within 200 episodes and achieve higher delay/quality-of-service satisfaction ratios than the random scheme.
Aiming at the low yield problem caused by the serious reliance on manual experience operation in the mud deposition process of rake suction dredger, this paper proposes a control strategy based on the ddpg algorithm f...
详细信息
ISBN:
(纸本)9798350386783;9798350386776
Aiming at the low yield problem caused by the serious reliance on manual experience operation in the mud deposition process of rake suction dredger, this paper proposes a control strategy based on the ddpg algorithm for the mud deposition process, which can realize the intelligent control of the mud deposition process according to the current construction *** the mechanism of sediment deposition process of rake suction dredger is analysed and modelled. Secondly, the ddpg algorithm is used to fully explore the process in a sediment deposition modelling environment. The experimental results show that the strategy reduces the overflow loss and increases the sediment deposition volume by adjusting the overflow barrel height, inlet flow rate and inlet density in real time to optimize the yield.
暂无评论