Texas Hold 'em is a typical example of computer incomplete information game. The traditional machine learning method has been unable to deal with the huge search state space of Texas Hold 'em. In this paper, t...
详细信息
ISBN:
(纸本)9781728158556
Texas Hold 'em is a typical example of computer incomplete information game. The traditional machine learning method has been unable to deal with the huge search state space of Texas Hold 'em. In this paper, the value-based reinforcement learning algorithm is adopted, which can deal with the huge amount of data without manual extraction of data features, and can realize the unsupervised training of the model. However, the value based reinforcement learning algorithm has the problems of too complex acquisition process and over estimation of dqn algorithm. Therefore, this paper introduces dqn-S model, which combines reinforcement learning algorithm with Monte Carlo game search, and integrates Sarsa algorithm to solve the above problems to a certain extent. Finally, experimental data show that the over estimation effect of dqn algorithm is weakened to a certain extent after the Sarsa algorithm is incorporated. The average return of each game of dqn-S model and model is more than 5 chips, and the average return of each game of dqn-S model is more than 3 chips in the game with its strongest training version.
In the realm of Natural Language Processing (NLP), Abstract Text Summarization (ATS) holds a crucial position, involving the transformation of lengthy textual content into concise summaries while retaining essential i...
详细信息
Reinforcement learning has emerged as a prominent technique for enhancing robot obstacle avoidance capabilities in recent years. This research provides a comprehensive overview of reinforcement learning methods, focus...
详细信息
Indoor temperature and relative humidity control in office buildings is crucial, which can affect thermal comfort, work efficiency, and even health of the occupants. In China, fan coil units (FCUs) are widely used as ...
详细信息
Indoor temperature and relative humidity control in office buildings is crucial, which can affect thermal comfort, work efficiency, and even health of the occupants. In China, fan coil units (FCUs) are widely used as air-conditioning equipment in office buildings. Currently, conventional FCU control methods often ignore the impact of indoor relative humidity on building occupants by focusing only on indoor temperature as a single control object. This study used FCUs with a fresh-air system in an office building in Beijing as the research object and proposed a deep reinforcement learning (RL) control algorithm to adjust the air supply volume for the FCUs. To improve the joint control satisfaction rate of indoor temperature and relative humidity, the proposed RL algorithm adopted the deep Q-network algorithm. To train the RL algorithm, a detailed simulation environment model was established in the Transient System Simulation Tool (TRNSYS), including a building model and FCUs with a fresh-air system model. The simulation environment model can interact with the RL agent in real time through a self-developed TRNSYS-Python co-simulation platform. The RL algorithm was trained, tested, and evaluated based on the simulation environment model. The results indicate that compared with the traditional on/off and rule-based controllers, the RL algorithm proposed in this study can increase the joint control satisfaction rate of indoor temperature and relative humidity by 12.66% and 9.5%, respectively. This study provides preliminary direction for a deep reinforcement learning control strategy for indoor temperature and relative humidity in office building heating, ventilation, and air-conditioning (HVAC) systems.
Deep reinforcement learning-based energy management strategy play an essential role in improving fuel economy and extending fuel cell lifetime for fuel cell hybrid electric vehicles. In this work, the traditional Deep...
详细信息
Deep reinforcement learning-based energy management strategy play an essential role in improving fuel economy and extending fuel cell lifetime for fuel cell hybrid electric vehicles. In this work, the traditional Deep Q-Network is compared with the Deep Q-Network with prioritized experience replay. Furthermore, the Deep Q-Network with prioritized experience replay is designed for energy management strategy to minimize hydrogen consumption and compared with the dynamic programming. Moreover, the fuel cell system degradation is incorporated into the objective function, and a balance between fuel economy and fuel cell system degradation is achieved by adjusting the degradation weight and the hydrogen consumption weight. Finally, the combined driving cycle is selected to further verify the effectiveness of the proposed strategy in unfamiliar driving environments and untrained situations. The training results under UDDS show that the fuel economy of the EMS decreases by 0.53 % when fuel cell system degradation is considered, reaching 88.73 % of the DP-based EMS in the UDDS, and the degradation of fuel cell system is effectively suppressed. At the same time, the computational efficiency is improved by more than 70 % compared to the DP-based strategy. (c) 2021 Elsevier Ltd. All rights reserved.
To further improve the line transport capacity, virtual coupling has become a frontier hot topic in the field of rail transit. Specially, the safe and efficient following control strategy based on relative distance br...
详细信息
To further improve the line transport capacity, virtual coupling has become a frontier hot topic in the field of rail transit. Specially, the safe and efficient following control strategy based on relative distance braking mode (RDBM) is one of the core technologies. This paper innovatively proposes a cooperative collision-avoidance control methodology, which can enhance the operation efficiency on the premise of ensuring the safety. Firstly, a novel framework for the RDBM based on the predicted trajectory of the preceding train is proposed for the train collision-avoidance control. To reduce the train following distance, a cooperative control model is further proposed and is formulated as a Markov decision process. Then, the Deep-Q-Network (dqn) algorithm is introduced to solve the efficient control problem by learning the safe and efficient control strategy for the following train where the critical elements of the reinforcement learning framework are designed. Finally, experimental simulations are conducted based on the simulated environment to illustrate the effectiveness of the proposed approach. Compared with the absolute distance braking mode (ADBM), the minimum following distance between the adjacent trains can be reduced by 70.23% on average via the proposed approach while the safety can be guaranteed.
暂无评论