检索结果-内蒙古大学图书馆

Model-free extended q-learning method for H∞, output tracking control of networked control systems with network delays and packet loss

引用

NEUROCOMPUTING 2025年 634卷

作者： Hao, Longyan Wang, Chaoli Liang, Dong Li, Shihua Univ Shanghai Sci & Technol Dept Control Sci & Engn Shanghai 200093 Peoples R China Southeast Univ Sch Automat Nanjing 211189 Peoples R China

In this paper, the extended q-learning method is used to study the HPo output tracking control (HOTC) problem of networked control systems with state delay and data loss. Compared with the existing results, the network control system in this paper contains both network delays and packet loss, as well as external disturbances. To deal with the disturbances, the HPo control problem is transformed into the maximum and minimum value problem, which is solved by the method of zero-sum game. The packet loss and delay of the state make it difficult to obtain accurate current state information. Therefore, it is necessary to design a new smith predictor that contains delay and packet loss to predict the current state. Using the predicted state, the extended qlearning algorithm is implemented to solve the HPo output tracking problem with unknown dynamics of the system. Then, the convergence of the extended q-learning algorithm is proved. Moreover, the stability and optimality of the proposed method are analyzed in the theorems. Finally, numerical simulation is performed to verify the effectiveness of the proposed algorithm.

关键词： q-learning algorithm Networked control systems H(infinity)output tracking control

来源：评论

学校读者我要写书评

暂无评论

The coevolution of cooperation: Integrating q-learning and occasional social interactions in evolutionary games

引用

CHAOS SOLITONS & FRACTALS 2025年 194卷

作者： Lin, Jiaying Long, Pinduo Liang, Jinfeng Dai, qionglin Li, Haihong Yang, Junzhong Beijing Univ Posts & Telecommun Sch Sci Beijing 100876 Peoples R China Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Beijing Univ Posts & Telecommun Key Lab Math & Informat Networks Minist Educ Beijing Peoples R China

This study explores the emergence and maintenance of cooperation in evolutionary game theory by incorporating occasional social interactions into q-learning algorithms. We model the dynamics on a square lattice, where individuals play the Prisoner's Dilemma Game and update their strategies based on q-learning and infrequent social interactions. Our main findings reveal a non-monotonic relationship between the game parameter c and cooperation levels, with cooperation re-emerging in adverse conditions. The interplay between q-learning and social learning mechanisms is key, with social learning playing a more significant role in sustaining cooperation under challenging conditions. This work advances our understanding of cooperation maintenance in populations and has implications for designing strategies to foster cooperation in real-world scenarios.

关键词： Evolutionary games Cooperation q-learning algorithm Prisoner's Dilemma Game

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered Data-Driven Control of Nonlinear Systems via q-learning

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2025年第2期55卷 1069-1077页

作者： Shen, Mouquan Wang, Xianming Zhu, Song Huang, Tingwen Wang, qing-Guo Nanjing Tech Univ Coll Elect Engn & Control Sci Nanjing 211816 Peoples R China Nanjing Tech Univ Sch Mech & Power Engn Nanjing 211816 Peoples R China China Univ Min & Technol Sch Math Xuzhou 221116 Peoples R China Shenzhen Univ Adv Technol Fac Comp Sci & Control Engn Shenzhen 518055 Peoples R China Beijing Normal Univ Inst Artificial Intelligence & Future Networks Zhuhai 519087 Peoples R China BNU HKBU United Int Coll Guangdong Key Lab AI & Multimodal Data Proc Zhuhai 519087 Peoples R China Univ Johannesburg Inst Intelligent Syst Fac Engn & Built Environm ZA-2006 Johannesburg South Africa

This article aims to study event-triggered data-driven control of nonlinear systems via q-learning. An input-output mapping is described by a pseudo-partial derivatives form. A q-learning-based optimization criterion is provided to establish a data-driven control law. A dynamic penalty factor composed of tracking errors is supplied to accelerate errors convergence. Consequently, a novel triggering rule related to this factor and performance cost is proposed to save communication resources. Sufficient conditions are developed for guaranteeing the ultimately uniform boundedness of the resultant tracking errors system. Two simulation studies are executed to verify the effectiveness of the presented scheme.

关键词： q-learning Nonlinear systems Mathematical models Indexes Performance analysis Event detection Convergence Costs Public transportation Optimization Data-driven event-triggered (ET) q-learning algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimizing Subchannel Assignment and Power Allocation for Network Slicing in High-Density NOMA Networks: A q-learning Approach

引用

IEEE ACCESS 2025年 13卷 24323-24335页

作者： Solaiman, Suhare Taif Univ Coll Comp & Informat Technol Dept Comp Sci Taif 21944 Saudi Arabia

The growing number of connected devices in high-density environments poses serious challenges for accommodating and managing these devices across different network slicing services, such as ultra-reliable low-latency communication (URLLC) and massive machine-type communication (mMTC). Because every service has distinct quality of service (qoS) requirements, it is essential to ensure the seamless coexistence of these devices. The main difficulty is allocating network resources to maximize spectrum utilization while meeting mMTC's massive connectivity demands and offering URLLC's demand for ultra-reliable, low-latency communication. In this study, non-orthogonal multiple access (NOMA) network slicing is utilized to share radio resources among various services, thereby improving connectivity for large-scale device deployments. When these services exist in high-density NOMA environments characterized by high network congestion with radio resource sharing, the level of difficulty increases significantly. To address these issues, an optimization algorithm is proposed for subchannel assignment and power allocation in NOMA high-density networks for URLLC and mMTC devices. The solution adopts a q-learning algorithm to optimize decision-making processes and ensure efficient resource sharing between URLLC and mMTC devices, while satisfying their distinct qoS requirements. Extensive simulations demonstrate that the proposed algorithm is flexible and scalable in dynamic scenarios, outperforming random and exhaustive search algorithms in high-density NOMA networks in terms of sum rates. The sum rate of the proposed algorithm increased by approximately 23.44% compared to that of the exhaustive search algorithm.

关键词： Ultra reliable low latency communication Network slicing Resource management NOMA Heuristic algorithms quality of service q-learning Interference Dynamic scheduling Uplink q-learning algorithm URLLC mMTC

来源：评论

学校读者我要写书评

暂无评论

learning to select operators in meta-heuristics: An integration of q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem

引用

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2023年第3期304卷 1296-1330页

作者： Karimi-Mamaghan, Maryam Mohammadi, Mehrdad Pasdeloup, Bastien Meyer, Patrick IMT Atlantique Lab STICC UMR CNRS 6285 F-29238 Brest France

This paper aims at integrating machine learning techniques into meta-heuristics for solving combinato-rial optimization problems. Specifically, our study develops a novel efficient iterated greedy algorithm based on reinforcement learning. The main novelty of the proposed algorithm is its new perturbation mechanism, which incorporates q-learning to select appropriate perturbation operators during the search process. Through an application to the permutation flowshop scheduling problem, comprehensive com-putational experiments are conducted on a wide range of benchmark instances to evaluate the perfor-mance of the proposed algorithm. This evaluation is done against non-learning versions of the iterated greedy algorithm and seven state-of-the-art algorithms from the literature. The experimental results and statistical analyses show the better performance of the proposed algorithm in terms of optimality gaps, convergence rate, and computational overhead.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Combinatorial optimization Iterated greedy meta-heuristic Reinforcement learning q-learning algorithm Permutation flowshop scheduling problem

来源：评论

学校读者我要写书评

暂无评论

Relay selection algorithm based on social network combined with q-learning for vehicle D2D communication

引用

IET COMMUNICATIONS 2019年第20期13卷 3582-3587页

作者： qian, Hongzhi Yu, Jinming Hua, Licheng Donghua Univ Coll Informat Sci & Technol Shanghai 201620 Peoples R China Ningbo Univ Fac Mech Engn & Mech Ningbo 315211 Zhejiang Peoples R China

A relay selection algorithm was proposed to improve a communication rate of D2D (device-to-device) users in-vehicle networking communication systems based on social network combined with q-learning. The scheme was divided into two steps. Firstly, a social threshold was introduced to filter the potential relay nodes to reduce the number of probing times based on the user's interest similarity in D2D communication network. Then, an optimal relay selection algorithm was proposed to maximise the total rate of D2D links based on a q-learning algorithm. This method can provide the optimal relay selection scheme to meet the requirements of vehicle networking communication. The simulation results showed that the proposed scheme could reduce the number of probing relays and improve the communication speed of the system on the basis of ensuring communication security.

关键词： cooperative communication radio networks relay networks (telecommunication) telecommunication security social networking (online) social network vehicle D2D communication communication rate social threshold potential relay nodes user D2D communication network optimal relay selection algorithm q-learning algorithm optimal relay selection scheme vehicle networking communication probing relays communication speed communication security

来源：评论

学校读者我要写书评

暂无评论

q-learning whale optimization algorithm for test suite generation with constraints support

引用

NEURAL COMPUTING & APPLICATIONS 2023年第34期35卷 24069-24090页

作者： Hassan, Ali Abdullah Abdullah, Salwani Zamli, Kamal Z. Razali, Rozilawati Univ Kebangsaan Malaysia Fac Informat Sci & Technol Bangi 43600 Selangor Malaysia Univ Malaysia Pahang Al Sultan Abdullah Fac Comp Pekan 26600 Pahang Malaysia Univ Airlangga Fac Sci & Technol Campus JI Dr H Soekamo C Surabaya 60115 Indonesia

This paper introduces a new variant of a metaheuristic algorithm based on the whale optimization algorithm (WOA), the q-learning algorithm and the Exponential Monte Carlo Acceptance Probability called (qWOA-EMC). Unlike WOA, qWOA-EMC permits just-in-time adaptive selection of its operators (i.e., between shrinking mechanism, spiral shape mechanism, and random generation) based on their historical performances as well as exploits the Monte Carlo Acceptance probability to further strengthen its exploration capabilities by allowing a poor performing operator to be reselected with probability in the early part of the iteration. Experimental results for constraints combinatorial test generation demonstrate that the proposed qWOA-EMC outperforms WOA and performs competitively against other metaheuristic algorithms.

关键词： Combinatorial testing Constrained software testing Meta-heuristic Test case generation Whale optimization algorithm Reinforcement learning q-learning algorithm

来源：评论

学校读者我要写书评

暂无评论

q-learning-based simulated annealing algorithm for constrained engineering design problems

引用

NEURAL COMPUTING & APPLICATIONS 2020年第9期32卷 5147-5161页

作者： Samma, Hussein Mohamad-Saleh, Junita Suandi, Shahrel Azmin Lahasan, Badr Univ Sains Malaysia Sch Elect & Elect Engn Intelligent Biometr Grp Engn Campus Nibong Tebal 14300 Penang Malaysia Univ Aden Fac Educ Shabwa Dept Comp Programming Aden Yemen

Simulated annealing (SA) was recognized as an effective local search optimizer, and it showed a great success in many real-world optimization problems. However, it has slow convergence rate and its performance is widely affected by the settings of its parameters, namely the annealing factor and the mutation rate. To mitigate these limitations, this study presents an enhanced optimizer that integrates q-learning algorithm with SA in a single optimization model, named qLSA. In particular, the q-learning algorithm is embedded into SA to enhance its performances by controlling its parameters adaptively at run time. The main characteristics of q-learning are that it applies reward/penalty technique to keep track of the best performing values of these parameters, i.e., annealing factor and the mutation rate. To evaluate the effectiveness of the proposed qLSA algorithm, a total of seven constrained engineering design problems were used in this study. The outcomes show that qLSA was able to report a mean fitness value of 1.33 on cantilever beam design, 263.60 on three-bar truss design, 1.72 on welded beam design, 5905.42 on pressure vessel design, 0.0126 on compression coil spring design, 0.25 on multiple disk clutch brake design, and 2994.47 on speed reducer design problem. Further analysis was conducted by comparing qLSA with the state-of-the-art population optimization algorithms including PSO, GWO, CLPSO, harmony, and ABC. The reported results show that qLSA significantly (i.e., 95% confidence level) outperforms other studied algorithms.

关键词： Simulated annealing q-learning algorithm Constrained engineering design problems

来源：评论

学校读者我要写书评

暂无评论

The q-learning obstacle avoidance algorithm based on EKF-SLAM for NAO autonomous walking under unknown environments

引用

ROBOTICS AND AUTONOMOUS SYSTEMS 2015年 72卷 29-36页

作者： Wen, Shuhuan Chen, Xiao Ma, Chunli Lam, H. K. Hua, Shaoyang Yanshan Univ Key Lab Ind Comp Control Engn Hebei Prov Qinhuangdao Peoples R China Kings Coll London Dept Informat London WC2R 2LS England

The two important problems of SLAM and Path planning are often addressed independently. However, both are essential to achieve successfully autonomous navigation. In this paper, we aim to integrate the two attributes for application on a humanoid robot. The SLAM problem is solved with the EKF-SLAM algorithm whereas the path planning problem is tackled via q-learning. The proposed algorithm is implemented on a NAO equipped with a laser head. In order to differentiate different landmarks at one observation, we applied clustering algorithm on laser sensor data. A Fractional Order PI controller (FOPI) is also designed to minimize the motion deviation inherent in during NAO's walking behavior. The algorithm is tested in an indoor environment to assess its performance. We suggest that the new design can be reliably used for autonomous walking in an unknown environment. (C) 2015 Elsevier B.V. All rights reserved.

关键词： Humanoid robot q-learning algorithm EKF-SLAM Path planning Clustering algorithm Laser sensor Autonomous navigation FOPI controller Obstacle avoidance Motion deviation

来源：评论

学校读者我要写书评

暂无评论

Inverse q-learning Optimal Control for Takagi-Sugeno Fuzzy Systems

引用

IEEE Transactions on Fuzzy Systems 2025年

作者： Song, Wenting Ning, Jun Tong, Shaocheng Liaoning University of Technology College of Science Jinzhou121000 China Dalian Maritime University Navigation College Dalian116026 China

Inverse reinforcement learning optimal control is under the framework of learner-expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learning algorithm and does not need the predefined cost function, so it can solve optimal control problem effectively. This paper develops a fuzzy inverse reinforcement learning optimal control scheme with inverse reinforcement learning algorithm for Takagi-Sugeno (T-S) fuzzy systems with disturbances. Since the controlled fuzzy systems (learner systems) desire to learn or imitate expert system's behavior trajectories, a learner-expert structure is established, where the learner only know the expert system's optimal control policy. To reconstruct expert system's cost function, we develop a model-free inverse q-learning algorithm that consists of two learning stages: an inner q-learning iteration loop and an outer inverse optimal iteration loop. The inner loop aims to find fuzzy optimal control policy and the worst-case disturbance input via learner system's cost function by employing zero-sum differential game theory. The outer one is to update learner system's state-penalty weight via only observing expert systems' optimal control policy. The model-free algorithm does not require that the controlled system dynamics are known. It is proved that the designed algorithm is convergent and also the developed inverse reinforcement learning optimal control policy can ensure T-S fuzzy learner system to obtain Nash equilibrium solution. Finally, we apply the presented fuzzy inverse q-learning optimal control method to nonlinear unmanned surface vehicle system and the computer simulation results verified the effectiveness of the developed scheme. © 2025 IEEE.

关键词： fuzzy inverse reinforcement learning optimal control q-learning algorithm Takagi-Sugeno (T-S) fuzzy systems zero-sum differential game

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：