IntroductionThis study proposes a q-learning-based optimization method for cultural heritage tourism routes, using the Historic Centre of Macau as a case study. The goal is to efficiently visit multiple attractions wi...
详细信息
IntroductionThis study proposes a q-learning-based optimization method for cultural heritage tourism routes, using the Historic Centre of Macau as a case study. The goal is to efficiently visit multiple attractions within a limited *** and methodsCoordinates of 25 heritage sites were obtained through the Google Maps API, and the Haversine formula was used to calculate distances. We designed a state space, action space, and reward function based on distance and time for dynamic route *** and conclusionThe results show that the q-learning algorithm creates the best route that includes all the attractions while shortening the whole path and achieving rapid convergence. The optimized routes improve visit efficiency and balance attraction utilization, preventing overcrowding in popular areas. This approach provides practical implications for intelligent cultural heritage tourism planning. By designing an intelligent tourism route planning system, this helps tourists explore the Historic Centre of Macau more efficiently. Future research will focus on refining the reward function by incorporating visitor preferences and real-time traffic conditions for greater personalization and applicability.
This paper surveys a new method to reduce the infected cells and free virus particles (virions) via a nonlinear HIV model. Three scenarios are considered for control performance evaluation. At first, the system and in...
详细信息
This paper surveys a new method to reduce the infected cells and free virus particles (virions) via a nonlinear HIV model. Three scenarios are considered for control performance evaluation. At first, the system and initial conditions are considered known completely. In the second case, the initial conditions are taken randomly. In the third scenario, in addition to uncertainty in initial condition, an additive noise is taken into account. The optimal control method is used to design an effective drug-schedule to reduce the number of infected cells and free virions with and without uncertainty. By using the q-learning algorithm, which is the most applicable algorithm in reinforcement learning, the drug delivery rate is obtained off-line. Since q-learning is a model-free algorithm, it is expected that the performance of the control in the presence of uncertainty does not change significantly. Simulation results confirm that the proposed control method has a good performance and high functionality in controlling the free virions for both certain and uncertain HIV models.
Coverage Path Planning (CPP in short) is a basic problem for mobile robot when facing a variety of applications. q-learning based coverage path planning algorithms are beginning to be explored recently. To overcome th...
详细信息
Coverage Path Planning (CPP in short) is a basic problem for mobile robot when facing a variety of applications. q-learning based coverage path planning algorithms are beginning to be explored recently. To overcome the problem of traditional q-learning of easily falling into local optimum, in this paper, the new-type reward functions originating from Predator-Prey model are introduced into traditional q-learning based CPP solution, which introduces a comprehensive reward function that incorporates three rewards including Predation Avoidance Reward Function, Smoothness Reward Function and Boundary Reward Function. In addition, the influence of weighting parameters on the total reward function is discussed. Extensive simulation results and practical experiments verify that the proposed Predator-Prey reward based q-learning Coverage Path Planning (PP-q-learning based CPP in short) has better performance than traditional BCD and q-learning based CPP in terms of repetition ratio and turns number.
This paper presents the application and design of a novel stochastic optimal control methodology based on the q-learning method for solving the automatic generation control (AGC) under the new control performance stan...
详细信息
This paper presents the application and design of a novel stochastic optimal control methodology based on the q-learning method for solving the automatic generation control (AGC) under the new control performance standards (CPS) for the North American Electric Reliability Council (NERC). The aims of CPS are to relax the control constraint requirements of AGC plant regulation and enhance the frequency dispatch support effect from interconnected control areas. The NERC's CPS-based AGC problem is a dynamic stochastic decision problem that can be modeled as a reinforcement learning (RL) problem based on the Markov decision process theory. In this paper, the q-learning method is adopted as the RL core algorithm with CPS values regarded as the rewards from the interconnected power systems;the CPS control and relaxed control objectives are formulated as immediate reward functions by means of a linear weighted aggregative approach. By regulating a closed-loop CPS control rule to maximize the long-term discounted reward in the procedure of online learning, the optimal CPS control strategy can be gradually obtained. This paper also introduces a practical semisupervisory group prelearning method to improve the stability and convergence ability of q-learning controllers during the prelearning process. Tests on the China Southern Power Grid demonstrate that the proposed control strategy can effectively enhance the robustness and relaxation property of AGC systems while CPS compliances are ensured. DOI:10.1061/(ASCE)EY.1943-7897.0000017. (C) 2011 American Society of Civil Engineers.
Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidt...
详细信息
Routing plays a critical role in data transmission for underwater acoustic sensor networks(UWSNs)in the internet of underwater things(IoUT).Traditional routing methods suffer from high end-toend delay,limited bandwidth,and high energy *** the development of artificial intelligence and machine learningalgorithms,many researchers apply these new methods to improve the quality of *** this paper,we propose a qlearning-based multi-hop cooperative routing protocol(qMCR)for *** protocol can automatically choose nodes with the maximum q-value as forwarders based on distance ***,we combine cooperative communications with q-learning algorithm to reduce network energy consumption and improve communication *** results show that the running time of the qMCR is less than one-tenth of that of the artificial fish-swarm algorithm(AFSA),while the routing energy consumption is kept at the same *** to the extremely fast speed of the algorithm,the qMCR is a promising method of routing design for UWSNs,especially for the case that it suffers from the extreme dynamic underwater acoustic channels in the real ocean environment.
Due to the complexity of interactive environments, dynamic obstacle avoidance path planning poses a significant challenge to agent mobility. Dynamic path planning is a complex multi-constraint combinatorial optimizati...
详细信息
Due to the complexity of interactive environments, dynamic obstacle avoidance path planning poses a significant challenge to agent mobility. Dynamic path planning is a complex multi-constraint combinatorial optimization problem. Some existing algorithms easily fall into local optimization when solving such problems, leading to defects in convergence speed and accuracy. Reinforcement learning has certain advantages in solving decision sequence problems in complex environments. A q-learning algorithm is a reinforcement learning method. In order to improve the value evaluation of the algorithm in solving practical problems, this paper introduces the priority weight into the q-learning algorithm. The improved algorithm is compared with existing algorithms and applied to dynamic obstacle avoidance path planning. Experiments show that the improved algorithm dramatically improves the convergence speed and accuracy and increases the value evaluation. The improved algorithm finds the shortest path of 16 units in 27 seconds.
Underwater acoustic sensor networks (UASNs) have emerged as a viable networking approach due to their numerous aquatic applications in recent years. As a vital component of UASNs, routing protocols are essential for e...
详细信息
Underwater acoustic sensor networks (UASNs) have emerged as a viable networking approach due to their numerous aquatic applications in recent years. As a vital component of UASNs, routing protocols are essential for ensuring reliable data transmissions and extending the longevity of UASNs. Recently, several clustering-based routing protocols have been proposed to reduce energy consumption and overcome the resource constraints of deployed sensor nodes. However, they rarely consider the hot-spots' problem and the sink node isolation problem in the multihop underwater sensor networks. In this article, we propose a q-learning-based hierarchical routing protocol with unequal clustering (qHUC) for determining an effective data forwarding path to extend the lifespan of UASNs. First, a hierarchical network structure is constructed for initialization. Then, a combination of unequal clustering and the q-learning algorithm is applied to the hierarchical structure to disperse the remaining energy more evenly throughout the network. With the use of the q-learning algorithm, a global optimal CH and next-hop can be determined better than a greedy one. In addition, the q value that guarantees the optimal routing decisions can be computed without incurring any additional costs by combining the q-learning algorithm with clustering. The simulation results show that the qHUC can achieve efficient routing and prolong the network lifetime significantly.
The capability of cyber-physical power system (CPPS) to recover from cascading failures caused by extreme events and restore prefailure functionality is a critical focus in resilience research. In contrast to the stro...
详细信息
The capability of cyber-physical power system (CPPS) to recover from cascading failures caused by extreme events and restore prefailure functionality is a critical focus in resilience research. In contrast to the strongly coupled systems studied by most researchers, this article examines weakly coupled CPPS, exploring result-oriented recovery approaches to enhance system resilience. Various repair methods are compared in terms of the resilience of weakly connected CPPS across different coupling modes and probabilities of failover. Utilizing the q-learning algorithm, an optimized sequence for network restoration is obtained to minimize the negative influence of failures on network functionality while reducing power loss. The proposed method's effectiveness and generalizability have been comprehensively verified through simulation experiments by establishing weakly coupled CPPS for the IEEE 39, IEEE 118, and IEEE 300 networks and their corresponding scale-free networks. Its rationality was verified through two recovery mechanisms: single-node recovery and multinode recovery. By comparing the proposed method with heuristic recovery methods and optimization-based recovery methods, we found that it can significantly accelerate network recovery, and improve network resilience, achieving better resilience centrality. These findings provide valuable insights for decision making in CPPS recovery work.
This paper presents an intelligent wind speed sensor less maximum power point tracking (MPPT) method for a variable speed wind energy conversion system (VS-WECS) based on a q-learning algorithm. The q-learning algorit...
详细信息
This paper presents an intelligent wind speed sensor less maximum power point tracking (MPPT) method for a variable speed wind energy conversion system (VS-WECS) based on a q-learning algorithm. The q-learning algorithm consists of q-values for each state action pair which is updated using reward and learning rate. Inputs to define these states are electrical power received by grid and rotational speed of the generator. In this paper, q-learning is equipped with peak detection technique, which drives the system towards peak power even if learning is incomplete which makes the real time tracking faster. To make the learning uniform, each state has its separate learning parameter instead of common learning parameter for all states as is the case in conventional q-learning. Therefore, if half learned system is running at peak point, it does not affect the learning of unvisited states. Also, wind speed change detection is combined with proposed algorithm which makes it eligible to work for varying wind speed conditions. In addition, the information of wind turbine characteristics and wind speed measurement is not needed. The algorithm is verified through simulations and experimentation and also compared with perturbation and observation (P&O) algorithm.
To increase competition, control price, and decrease inefficiency in the carbon allowance auction market, limitations on bidding price and volume can be set. With limitations, participants have the same cap bidding pr...
详细信息
To increase competition, control price, and decrease inefficiency in the carbon allowance auction market, limitations on bidding price and volume can be set. With limitations, participants have the same cap bidding price and volume. While without the limitations, participants have different values per unit of carbon allowance;therefore, some participants may be strong and the other week. Due to the impact of these limitations on the auction, this paper tries to compare the uniform and discriminatory pricing in a carbon allowance auction with and without the limitations utilizing a multi-agent-based model consisting of the government and supply chains. The government determines the supply chains' initial allowances. The supply chains compete in the carbon auction market and determine their bidding strategies based on the q-learning algorithm. Then they optimize their tactical and operational decisions. They can also trade their carbon allowances in a carbon trading market in which price is free determined according to carbon supply and demand. Results show that without the limitations, the carbon price in the uniform pricing is less than or equal to the discriminatory pricing method. At the same time, there are no differences between them in the case with limitations. Overall, the auction reduces the profit of the supply chains. This negative effect is less in uniform than discriminatory pricing in the case without the limitations. Nevertheless, the strong supply chains make huge profits from the auction when mitigation rate is high.
暂无评论