In this paper, the extended q-learning method is used to study the HPo output tracking control (HOTC) problem of networked control systems with state delay and data loss. Compared with the existing results, the networ...
详细信息
In this paper, the extended q-learning method is used to study the HPo output tracking control (HOTC) problem of networked control systems with state delay and data loss. Compared with the existing results, the network control system in this paper contains both network delays and packet loss, as well as external disturbances. To deal with the disturbances, the HPo control problem is transformed into the maximum and minimum value problem, which is solved by the method of zero-sum game. The packet loss and delay of the state make it difficult to obtain accurate current state information. Therefore, it is necessary to design a new smith predictor that contains delay and packet loss to predict the current state. Using the predicted state, the extended qlearningalgorithm is implemented to solve the HPo output tracking problem with unknown dynamics of the system. Then, the convergence of the extended q-learning algorithm is proved. Moreover, the stability and optimality of the proposed method are analyzed in the theorems. Finally, numerical simulation is performed to verify the effectiveness of the proposed algorithm.
This study explores the emergence and maintenance of cooperation in evolutionary game theory by incorporating occasional social interactions into q-learning algorithms. We model the dynamics on a square lattice, where...
详细信息
This study explores the emergence and maintenance of cooperation in evolutionary game theory by incorporating occasional social interactions into q-learning algorithms. We model the dynamics on a square lattice, where individuals play the Prisoner's Dilemma Game and update their strategies based on q-learning and infrequent social interactions. Our main findings reveal a non-monotonic relationship between the game parameter c and cooperation levels, with cooperation re-emerging in adverse conditions. The interplay between q-learning and social learning mechanisms is key, with social learning playing a more significant role in sustaining cooperation under challenging conditions. This work advances our understanding of cooperation maintenance in populations and has implications for designing strategies to foster cooperation in real-world scenarios.
This article aims to study event-triggered data-driven control of nonlinear systems via q-learning. An input-output mapping is described by a pseudo-partial derivatives form. A q-learning-based optimization criterion ...
详细信息
This article aims to study event-triggered data-driven control of nonlinear systems via q-learning. An input-output mapping is described by a pseudo-partial derivatives form. A q-learning-based optimization criterion is provided to establish a data-driven control law. A dynamic penalty factor composed of tracking errors is supplied to accelerate errors convergence. Consequently, a novel triggering rule related to this factor and performance cost is proposed to save communication resources. Sufficient conditions are developed for guaranteeing the ultimately uniform boundedness of the resultant tracking errors system. Two simulation studies are executed to verify the effectiveness of the presented scheme.
The growing number of connected devices in high-density environments poses serious challenges for accommodating and managing these devices across different network slicing services, such as ultra-reliable low-latency ...
详细信息
The growing number of connected devices in high-density environments poses serious challenges for accommodating and managing these devices across different network slicing services, such as ultra-reliable low-latency communication (URLLC) and massive machine-type communication (mMTC). Because every service has distinct quality of service (qoS) requirements, it is essential to ensure the seamless coexistence of these devices. The main difficulty is allocating network resources to maximize spectrum utilization while meeting mMTC's massive connectivity demands and offering URLLC's demand for ultra-reliable, low-latency communication. In this study, non-orthogonal multiple access (NOMA) network slicing is utilized to share radio resources among various services, thereby improving connectivity for large-scale device deployments. When these services exist in high-density NOMA environments characterized by high network congestion with radio resource sharing, the level of difficulty increases significantly. To address these issues, an optimization algorithm is proposed for subchannel assignment and power allocation in NOMA high-density networks for URLLC and mMTC devices. The solution adopts a q-learning algorithm to optimize decision-making processes and ensure efficient resource sharing between URLLC and mMTC devices, while satisfying their distinct qoS requirements. Extensive simulations demonstrate that the proposed algorithm is flexible and scalable in dynamic scenarios, outperforming random and exhaustive search algorithms in high-density NOMA networks in terms of sum rates. The sum rate of the proposed algorithm increased by approximately 23.44% compared to that of the exhaustive search algorithm.
This paper aims at integrating machine learning techniques into meta-heuristics for solving combinato-rial optimization problems. Specifically, our study develops a novel efficient iterated greedy algorithm based on r...
详细信息
This paper aims at integrating machine learning techniques into meta-heuristics for solving combinato-rial optimization problems. Specifically, our study develops a novel efficient iterated greedy algorithm based on reinforcement learning. The main novelty of the proposed algorithm is its new perturbation mechanism, which incorporates q-learning to select appropriate perturbation operators during the search process. Through an application to the permutation flowshop scheduling problem, comprehensive com-putational experiments are conducted on a wide range of benchmark instances to evaluate the perfor-mance of the proposed algorithm. This evaluation is done against non-learning versions of the iterated greedy algorithm and seven state-of-the-art algorithms from the literature. The experimental results and statistical analyses show the better performance of the proposed algorithm in terms of optimality gaps, convergence rate, and computational overhead.(c) 2022 Elsevier B.V. All rights reserved.
A relay selection algorithm was proposed to improve a communication rate of D2D (device-to-device) users in-vehicle networking communication systems based on social network combined with q-learning. The scheme was div...
详细信息
A relay selection algorithm was proposed to improve a communication rate of D2D (device-to-device) users in-vehicle networking communication systems based on social network combined with q-learning. The scheme was divided into two steps. Firstly, a social threshold was introduced to filter the potential relay nodes to reduce the number of probing times based on the user's interest similarity in D2D communication network. Then, an optimal relay selection algorithm was proposed to maximise the total rate of D2D links based on a q-learning algorithm. This method can provide the optimal relay selection scheme to meet the requirements of vehicle networking communication. The simulation results showed that the proposed scheme could reduce the number of probing relays and improve the communication speed of the system on the basis of ensuring communication security.
This paper introduces a new variant of a metaheuristic algorithm based on the whale optimization algorithm (WOA), the q-learning algorithm and the Exponential Monte Carlo Acceptance Probability called (qWOA-EMC). Unli...
详细信息
This paper introduces a new variant of a metaheuristic algorithm based on the whale optimization algorithm (WOA), the q-learning algorithm and the Exponential Monte Carlo Acceptance Probability called (qWOA-EMC). Unlike WOA, qWOA-EMC permits just-in-time adaptive selection of its operators (i.e., between shrinking mechanism, spiral shape mechanism, and random generation) based on their historical performances as well as exploits the Monte Carlo Acceptance probability to further strengthen its exploration capabilities by allowing a poor performing operator to be reselected with probability in the early part of the iteration. Experimental results for constraints combinatorial test generation demonstrate that the proposed qWOA-EMC outperforms WOA and performs competitively against other metaheuristic algorithms.
Simulated annealing (SA) was recognized as an effective local search optimizer, and it showed a great success in many real-world optimization problems. However, it has slow convergence rate and its performance is wide...
详细信息
Simulated annealing (SA) was recognized as an effective local search optimizer, and it showed a great success in many real-world optimization problems. However, it has slow convergence rate and its performance is widely affected by the settings of its parameters, namely the annealing factor and the mutation rate. To mitigate these limitations, this study presents an enhanced optimizer that integrates q-learning algorithm with SA in a single optimization model, named qLSA. In particular, the q-learning algorithm is embedded into SA to enhance its performances by controlling its parameters adaptively at run time. The main characteristics of q-learning are that it applies reward/penalty technique to keep track of the best performing values of these parameters, i.e., annealing factor and the mutation rate. To evaluate the effectiveness of the proposed qLSA algorithm, a total of seven constrained engineering design problems were used in this study. The outcomes show that qLSA was able to report a mean fitness value of 1.33 on cantilever beam design, 263.60 on three-bar truss design, 1.72 on welded beam design, 5905.42 on pressure vessel design, 0.0126 on compression coil spring design, 0.25 on multiple disk clutch brake design, and 2994.47 on speed reducer design problem. Further analysis was conducted by comparing qLSA with the state-of-the-art population optimization algorithms including PSO, GWO, CLPSO, harmony, and ABC. The reported results show that qLSA significantly (i.e., 95% confidence level) outperforms other studied algorithms.
The two important problems of SLAM and Path planning are often addressed independently. However, both are essential to achieve successfully autonomous navigation. In this paper, we aim to integrate the two attributes ...
详细信息
The two important problems of SLAM and Path planning are often addressed independently. However, both are essential to achieve successfully autonomous navigation. In this paper, we aim to integrate the two attributes for application on a humanoid robot. The SLAM problem is solved with the EKF-SLAM algorithm whereas the path planning problem is tackled via q-learning. The proposed algorithm is implemented on a NAO equipped with a laser head. In order to differentiate different landmarks at one observation, we applied clustering algorithm on laser sensor data. A Fractional Order PI controller (FOPI) is also designed to minimize the motion deviation inherent in during NAO's walking behavior. The algorithm is tested in an indoor environment to assess its performance. We suggest that the new design can be reliably used for autonomous walking in an unknown environment. (C) 2015 Elsevier B.V. All rights reserved.
Inverse reinforcement learning optimal control is under the framework of learner-expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learningalgorithm an...
详细信息
暂无评论