检索结果-内蒙古大学图书馆

Graph attention, learning 2-opt algorithm for the traveling salesman problem

COMPLEX & INTELLIGENT SYSTEMS 2025年第1期11卷 1-21页

作者： Luo, Jia Heng, Herui Wu, Geng Ningbo Univ Technol Sch Econ & Management Ningbo 315211 Peoples R China Shanghai Maritime Univ Inst Logist Sci & Engn Shanghai 201306 Peoples R China

In recent years, deep graph neural networks (GNNs) have been used as solvers or helper functions for the traveling salesman problem (TSP), but they are usually used as encoders to generate static node representations for downstream tasks and are incapable of obtaining the dynamic permutational information in completely updating solutions. For addressing this problem, we propose a permutational encoding graph attention encoder and attention-based decoder (PEG2A) model for the TSP that is trained by the advantage actor-critic algorithm. In this work, the permutational encoding graph attention (PEGAT) network is designed to encode node embeddings for gathering information from neighbors and obtaining the dynamic graph permutational information simultaneously. The attention-based decoder is tailored to compute probability distributions over picking pair nodes for 2-opt moves. The experimental results show that our method outperforms the compared learning-based algorithms and traditional heuristic methods.

关键词： TSP 2-opt Permutational encoding Graph attention network Attention-based mechanism actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimizing Non-Terrestrial Hybrid RF/FSO Links With Reinforcement Learning: Navigating Through Clouds

IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY

引用

IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY 2025年 6卷 793-806页

作者： Almohamad, Abdullateef Ibrahim, Mostafa Ekin, Sabit Hasna, Mazen Althunibat, Saud Qaraqe, Khalid Texas A&M Univ Coll Stn Dept Elect & Comp Engn College Stn TX 77843 USA Texas A&M Univ Coll Stn Dept Engn Technol & Ind Distribut College Stn TX 77843 USA Qatar Univ Elect Engn Dept Doha Qatar Al Hussein Bin Talal Univ Dept Commun Engn Maan Jordan Hamad Bin Khalifa Univ Coll Sci & Engn Doha Qatar

In the pursuit of ubiquitous broadband connectivity, there has been a significant shift towards the vertical expansion of communication networks into space, particularly through the exploitation of low Earth orbit (LEO) satellite constellations, which are favored for their relatively low latency. However, this approach faces many challenges that need to be addressed, including atmospheric turbulence, high path loss, and dynamic cloud formations. High-altitude pseudo-satellites (HAPS) have emerged as promising relaying layers between LEO satellites and ground stations, enhancing coverage, latency, and direct terrestrial user connectivity. While radio frequency (RF) bands suffer from congestion and limited bandwidth, free space optical (FSO) communications offer higher data rates, but are susceptible to misalignment and weather-induced signal degradation. To address these challenges, a hybrid RF/FSO approach has been proposed to take advantage of both technologies by dynamic switching between RF and FSO based on propagation channel conditions. This paper introduces a reinforcement learning-based algorithm designed to optimize the trajectory of HAPS, maneuver around cloudy areas, and seamlessly switch between the RF and FSO communication modes to maximize the achievable capacity. The proposed approach aims to maximize system performance by intelligently adapting to environmental conditions and offering a promising solution for next-generation space communication networks.

关键词： Radio frequency Clouds Meteorology Low earth orbit satellites Autonomous aerial vehicles Trajectory optimization Reliability Communication systems Vehicle dynamics Satellite broadcasting actor-critic algorithm high altitude pseudo satellite (HAPS) hybrid RF/FSO optical communications reinforcement learning satellite communications

来源：评论

学校读者我要写书评

暂无评论

Formation control scheme with reinforcement learning strategy for a group of multiple surface vehicles

引用

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL 2024年第3期34卷 2252-2279页

作者： Nguyen, Khai Dang, Van Trong Pham, Dinh Duong Dao, Phuong Nam Carnegie Mellon Univ Dept Mech Engn Pittsburgh PA USA Hanoi Univ Sci & Technol Sch Elect & Elect Engn Hanoi Vietnam

This article presents a comprehensive approach to integrate formation tracking control and optimal control for a fleet of multiple surface vehicles (SVs), accounting for both kinematic and dynamic models of each SV agent. The proposed control framework comprises two core components: a high-level displacement-based formation controller and a low-level reinforcement learning (RL)-based optimal control strategy for individual SV agents. The high-level formation control law, employing a modified gradient method, is introduced to guide the SVs in achieving desired formations. Meanwhile, the low-level control structure, featuring time-varying references, incorporates the RL algorithm by transforming the time-varying closed agent system into an equivalent autonomous system. The application of Lyapunov's direct approach, along with the existence of the Bellman function, guarantees the stability and optimality of the proposed design. Through extensive numerical simulations, encompassing various comparisons and scenarios, this study demonstrates the efficacy of the novel formation control strategy for multiple SV agent systems, showcasing its potential for real-world applications.

关键词： actor-critic algorithm formation control multi-agent system (MAS) reinforcement learning (RL) surface vehicle (SV)

来源：评论

学校读者我要写书评

暂无评论

Uncertainty modified policy for multi-agent reinforcement learning

引用

APPLIED INTELLIGENCE 2024年第22期54卷 12020-12034页

作者： Zhao, Xinyu Liu, Jianxiang Wu, Faguo Zhang, Xiao Wang, Guojian Beihang Univ Sch Math Sci 37 Xueyuan Rd Beijing 100190 Peoples R China Beihang Univ Inst Artificial Intelligence 37 Xueyuan Rd Beijing 100190 Peoples R China Beihang Univ Key Lab Math Informat & Behav Semant LMIB Beijing 100191 Peoples R China Zhongguancun Lab Beijing 100194 Peoples R China Beihang Univ Bejing Adv Innovat Ctr Future Blockchain & Privacy Beijing 100191 Peoples R China

Uncertainty in the evolution of opponent behavior creates a non-stationary environment for the agent, reducing the reliability of value estimation and strategy selection while compromising security during the exploration process. Previous studies have developed various uncertainty quantification techniques and designed uncertainty-aware exploration methods for multi-agent reinforcement learning (MARL). However, existing methods have gaps in theoretical research and experimental verification of decoupling uncertainty between opponents and environment, which can decrease learning efficiency and lead to an unstable training process. Due to inaccurate opponent modeling, the agent is vulnerable to harm from opponents, which is undesirable in real-world tasks. To address these issues, this study proposes a novel uncertainty-guided safe exploration strategy for MARL that decouples the two types of uncertainty originating from the environment and opponents. Specifically, we introduce an uncertainty decoupling quantification technique based on a novel variance decomposition method for action-value functions. Furthermore, we present an uncertainty-aware policy optimization mechanism to facilitate safe exploration in MARL. Finally, we propose a new adaptive parameter scaling method to ensure efficient exploration by the agents. Theoretical analysis establishes the proposed approach's convergence rate, and its effectiveness is demonstrated empirically. Extensive experiments on benchmark tasks spanning differential games, multi-agent particle environments, and RoboSumo validate the proposed uncertainty-guided method's significant advantages in attaining higher scores and facilitating safe agent exploration.

关键词： Multi-agent reinforcement learning Uncertainty quantification actor-critic algorithm Opponent modeling

来源：评论

学校读者我要写书评

暂无评论

Fractional-order fuzzy sliding mode control of uncertain nonlinear MIMO systems using fractional-order reinforcement learning

引用

COMPLEX & INTELLIGENT SYSTEMS 2024年第2期10卷 3057-3085页

作者： Mahmoud, Tarek A. El-Hossainy, Mohammad Abo-Zalam, Belal Shalaby, Raafat Menoufia Univ Fac Elect Engn Dept Ind Elect & Control Engn Menoufia 32952 Egypt Nile Univ SESC Res Ctr Sch Engn & Appl Sci MECT Program Giza 12588 Egypt

This paper introduces a novel approach aimed at enhancing the control performance of a specific class of unknown multiple-input and multiple-output nonlinear systems. The proposed method involves the utilization of a fractional-order fuzzy sliding mode controller, which is implemented through online fractional-order reinforcement learning (FOFSMC-FRL). First, the proposed approach leverages two Takagi-Sugeno-Kang (TSK) fuzzy neural network actors. These actors approximate both the equivalent and switch control parts of the sliding mode control. Additionally, a critic TSK fuzzy neural network is employed to approximate the value function of the reinforcement learning process. Second, the FOFSMC-FRL parameters undergo online adaptation using an innovative fractional-order Levenberg-Marquardt learning method. This adaptive mechanism allows the controller to continuously update its parameters based on the system's behavior, optimizing its control strategy accordingly. Third, the stability and convergence of the proposed approach are rigorously examined using Lyapunov theorem. Notably, the proposed structure offers several key advantages as it does not depend on knowledge of the system dynamics, uncertainty bounds, or disturbance characteristics. Moreover, the chattering phenomenon, often associated with sliding mode control, is effectively eliminated without compromising the system's robustness. Finally, a comparative simulation study is conducted to demonstrate the feasibility and superiority of the proposed method over other control methods. Through this comparison, the effectiveness and performance advantages of the approach are validated.

关键词： Fractional-order sliding mode controller TSK-fuzzy system actor-critic algorithm Levenberg Marquardt method Nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning with dynamic convex risk measures

引用

MATHEMATICAL FINANCE 2024年第2期34卷 557-587页

作者： Coache, Anthony Jaimungal, Sebastian Univ Toronto Dept Stat Sci Toronto ON Canada Univ Oxford Oxford Man Inst Oxford England

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.

关键词： actor-critic algorithm dynamic risk measures financial hedging policy gradient reinforcement learning robot control time-consistency trading strategies

来源：评论

学校读者我要写书评

暂无评论

Dynamic Navigation in Unconstrained Environments Using Reinforcement Learning algorithms

引用

IEEE ACCESS 2023年 11卷 117984-118001页

作者： Chronis, Christos Anagnostopoulos, Georgios Politi, Elena Dimitrakopoulos, George Varlamis, Iraklis Harokopio Univ Athens Dept Informat & Telemat Athens 17779 Greece

The potential for the use of drones in logistics and transportation is continuously growing, with multiple applications both in urban and rural environments. The safe navigation of drones in such environments is a major challenge that requires sophisticated algorithms and systems, that can quickly and efficiently estimate the situation, find the shortest path to the target, and detect and avoid obstacles. Traditional path planning algorithms are unable to handle the dynamic and uncertain nature of real environments, while traditional machine learning models are insufficient due to the constantly changing conditions that affect the drone location and the location of obstacles. Reinforcement learning (RL) algorithms have been widely used for autonomous navigation problems, however, computational complexity and energy demands of such methods can become a bottleneck to the execution of UAV flights. In this paper, we propose the use of a minimum set of sensors and reinforcement learning (RL) algorithms for the safe and efficient navigation of drones in urban and rural environments. Our approach considers the complex and dynamic nature of such environments by incorporating real-time data from low-cost onboard sensors. After performing a thorough review of the existing solutions in drone path planning and navigation in 3-D environments, we experimentally evaluate our proposed approach in a simulated environment, and in various scenarios. The test results demonstrate the effectiveness of the proposed RL-based approach in the navigation of drones in complex and unconstrained environments. The implemented approach can serve as a basis for the development of advanced and robust navigation systems for drones, which can improve safety and efficiency in transportation applications in the near future.

关键词： actor-critic algorithm drone navigation dynamic path planning proximal policy optimization reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Maximum Divergence Approach to Optimal Policy in Deep Reinforcement Learning

引用

IEEE TRANSACTIONS ON CYBERNETICS 2023年第3期53卷 1499-1510页

作者： Yang, Zhiyou Qu, Hong Fu, Mingsheng Hu, Wang Zhao, Yongze Univ Elect Sci & Technol China Sch Comp Sci & Engn Chengdu 610054 Peoples R China

Model-free reinforcement learning algorithms based on entropy regularized have achieved good performance in control tasks. Those algorithms consider using the entropy-regularized term for the policy to learn a stochastic policy. This work provides a new perspective that aims to explicitly learn a representation of intrinsic information in state transition to obtain a multimodal stochastic policy, for dealing with the tradeoff between exploration and exploitation. We study a class of Markov decision processes (MDPs) with divergence maximization, called divergence MDPs. The goal of the divergence MDPs is to find an optimal stochastic policy that maximizes the sum of both the expected discounted total rewards and a divergence term, where the divergence function learns the implicit information of state transition. Thus, it can provide better-off stochastic policies to improve both in robustness and performance in a high-dimension continuous setting. Under this framework, the optimality equations can be obtained, and then a divergence actor-critic algorithm is developed based on the divergence policy iteration method to address large-scale continuous problems. The experimental results, compared to other methods, show that our approach achieved better performance and robustness in the complex environment particularly. The code of DivAC can be found in https://***/yzyvl/DivAC.

关键词： Entropy Reinforcement learning Task analysis Robustness Robots Predictive models Markov processes actor-critic algorithm continuous domains divergence Markov decision processes (MDPs) optimality conditions reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Simultaneous locomotion and manipulation control of quadruped robots using reinforcement learning-based adaptive fractional-order sliding-mode control

引用

TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL 2023年第13期45卷 2459-2476页

作者： Farid, Yousef Tarbiat Modares Univ Sch Elect & Comp Engn POB 14115-111 Tehran Iran

This paper investigates a model-free reinforcement learning-based approach that enables the quadruped robot to manipulate objects while maintaining its balance and dynamic stability during walking. At first, the dynamics of quadruped robots in two sub-spaces, position control space and force control space, are developed. Then, a new long-term performance index is introduced, and a radial basis function neural network as a critic network is presented to estimate the unobtainable long-term performance index. Based on the exported reinforcement signal, the actor neural network is introduced to generate the feedforward compensation term to cope with the nonlinear dynamics and the system uncertainties. The robustness of the actor-critic reinforcement learning algorithm is enhanced by using a fractional-order sliding-mode controller in the closed-loop system. The online adaptive laws for both the critic and actor-network weights are obtained using the Lyapunov stability theory. As a result, the uniformly ultimately boundedness of the position and the force tracking errors are proven. Finally, numerical simulations are conducted to illustrate the feasibility and effectiveness of the proposed adaptive actor-critic learning-based control scheme.

关键词： Quadruped robots reinforcement learning actor-critic algorithm fractional terminal sliding surface radial basis networks

来源：评论

学校读者我要写书评

暂无评论

Scalable and Decentralized algorithms for Anomaly Detection via Learning-Based Controlled Sensing

引用

IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS 2023年 9卷 640-654页

作者： Joseph, Geethu Zhong, Chen Gursoy, M. Cenk Velipasalar, Senem Varshney, Pramod K. Delft Univ Technol Signal Proc Syst Grp NL-2628 Delft Netherlands Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA

We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. We develop an anomaly detection algorithm that chooses the processes to be observed at a given time instant, decides when to stop taking observations, and declares the decision on anomalous processes. The objective of the detection algorithm is to identify the anomalies with an accuracy exceeding the desired value while minimizing the delay in decision making. We devise a centralized algorithm where the processes are jointly selected by a common agent as well as a decentralized algorithm where the decision of whether to select a process is made independently for each process. Our algorithms rely on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithms using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithms have computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of these algorithms using numerical experiments by comparing them with state-of-the-art methods.

关键词： Active hypothesis testing deep learning reinforcement learning actor-critic algorithm quickest state estimation sequential decision-making sequential sensing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：